TL;DR: This work introduces a novel convolutional network architecture for the task of human pose estimation that is described as a “stacked hourglass” network based on the successive steps of pooling and upsampling that are done to produce a final set of predictions.
Abstract: This work introduces a novel convolutional network architecture for the task of human pose estimation. Features are processed across all scales and consolidated to best capture the various spatial relationships associated with the body. We show how repeated bottom-up, top-down processing used in conjunction with intermediate supervision is critical to improving the performance of the network. We refer to the architecture as a “stacked hourglass” network based on the successive steps of pooling and upsampling that are done to produce a final set of predictions. State-of-the-art results are achieved on the FLIC and MPII benchmarks outcompeting all recent methods.
TL;DR: A unified deep neural network, denoted the multi-scale CNN (MS-CNN), is proposed for fast multi- scale object detection, which is learned end-to-end, by optimizing a multi-task loss.
Abstract: A unified deep neural network, denoted the multi-scale CNN (MS-CNN), is proposed for fast multi-scale object detection. The MS-CNN consists of a proposal sub-network and a detection sub-network. In the proposal sub-network, detection is performed at multiple output layers, so that receptive fields match objects of different scales. These complementary scale-specific detectors are combined to produce a strong multi-scale object detector. The unified network is learned end-to-end, by optimizing a multi-task loss. Feature upsampling by deconvolution is also explored, as an alternative to input upsampling, to reduce the memory and computation costs. State-of-the-art object detection performance, at up to 15 fps, is reported on datasets, such as KITTI and Caltech, containing a substantial number of small objects.
TL;DR: A new method to address the problem of depth map super resolution in which a high-resolution (HR) depth map is inferred from a LR depth map and an additional HR intensity image of the same scene is presented.
Abstract: Depth boundaries often lose sharpness when upsampling from low-resolution (LR) depth maps especially at large upscaling factors. We present a new method to address the problem of depth map super resolution in which a high-resolution (HR) depth map is inferred from a LR depth map and an additional HR intensity image of the same scene. We propose a Multi-Scale Guided convolutional network (MSG-Net) for depth map super resolution. MSG-Net complements LR depth features with HR intensity features using a multi-scale fusion strategy. Such a multi-scale guidance allows the network to better adapt for upsampling of both fine- and large-scale structures. Specifically, the rich hierarchical HR intensity features at different levels progressively resolve ambiguity in depth map upsampling. Moreover, we employ a high-frequency domain training method to not only reduce training time but also facilitate the fusion of depth and intensity features. With the multi-scale guidance, MSG-Net achieves state-of-art performance for depth map upsampling.
TL;DR: In this article, the Laplacian pyramid transform for signals on Euclidean domains is adapted to analyze high-dimensional data residing on the vertices of a weighted graph.
Abstract: Multiscale transforms designed to process analog and discrete-time signals and images cannot be directly applied to analyze high-dimensional data residing on the vertices of a weighted graph, as they do not capture the intrinsic topology of the graph data domain. In this paper, we adapt the Laplacian pyramid transform for signals on Euclidean domains so that it can be used to analyze high-dimensional data residing on the vertices of a weighted graph. Our approach is to study existing methods and develop new methods for the four fundamental operations of graph downsampling, graph reduction, and filtering and interpolation of signals on graphs. Equipped with appropriate notions of these operations, we leverage the basic multiscale constructs and intuitions from classical signal processing to generate a transform that yields both a multiresolution of graphs and an associated multiresolution of a graph signal on the underlying sequence of graphs.
TL;DR: An algorithm to accelerate a large class of image processing operators by fitting local curves that map the input to the output that faithfully models state-of-the-art operators for tone mapping, style transfer, and recoloring is presented.
Abstract: We present an algorithm to accelerate a large class of image processing operators. Given a low-resolution reference input and output pair, we model the operator by fitting local curves that map the input to the output. We can then produce a full-resolution output by evaluating these low-resolution curves on the full-resolution input. We demonstrate that this faithfully models state-of-the-art operators for tone mapping, style transfer, and recoloring. The curves are computed by lifting the input into a bilateral grid and then solving for the 3D array of affine matrices that best maps input color to output color per x, y, intensity bin. We enforce a smoothness term on the matrices which prevents false edges and noise amplification. We can either globally optimize this energy, or quickly approximate a solution by locally fitting matrices and then enforcing smoothness by blurring in grid space. This latter option reduces to joint bilateral upsampling [Kopf et al. 2007] or the guided filter [He et al. 2013], depending on the choice of parameters. The cost of running the algorithm is reduced to the cost of running the original algorithm at greatly reduced resolution, as fitting the curves takes about 10 ms on mobile devices, and 1--2 ms on desktop CPUs, and evaluating the curves can be done with a simple GPU shader.
TL;DR: Zhang et al. as discussed by the authors proposed a new image super-resolution method, which jointly learns the feature extraction, upsampling and HR reconstruction modules, yielding a completely end-to-end trainable deep CNN.
Abstract: One impressive advantage of convolutional neural networks (CNNs) is their ability to automatically learn feature representation from raw pixels, eliminating the need for hand-designed procedures. However, recent methods for single image super-resolution (SR) fail to maintain this advantage. They utilize CNNs in two decoupled steps, i.e., first upsampling the low resolution (LR) image to the high resolution (HR) size with hand-designed techniques (e.g., bicubic interpolation), and then applying CNNs on the upsampled LR image to reconstruct HR results. In this paper, we seek an alternative and propose a new image SR method, which jointly learns the feature extraction, upsampling and HR reconstruction modules, yielding a completely end-to-end trainable deep CNN. As opposed to existing approaches, the proposed method conducts upsampling in the latent feature space with filters that are optimized for the task of image SR. In addition, the HR reconstruction is performed in a multi-scale manner to simultaneously incorporate both short- and long-range contextual information, ensuring more accurate restoration of HR images. To facilitate network training, a new training approach is designed, which jointly trains the proposed deep network with a relatively shallow network, leading to faster convergence and more superior performance. The proposed method is extensively evaluated on widely adopted data sets and improves the performance of state-of-the-art methods with a considerable margin. Moreover, in-depth ablation studies are conducted to verify the contribution of different network designs to image SR, providing additional insights for future research.
TL;DR: This work addresses the problem of how to transfer fine structures of guidance signals to input images, restoring noisy or altered structures in a data-dependent framework by jointly leveraging structural information of guidance and input images.
Abstract: Filtering images using a guidance signal, a process called joint or guided image filtering, has been used in various tasks in computer vision and computational photography, particularly for noise reduction and joint upsampling. The aim is to transfer the structure of the guidance signal to an input image, restoring noisy or altered image structure. The main drawbacks of such a data-dependent framework are that it does not consider structural differences between guidance and input images, and it is not robust to outliers. We propose a novel SD (for static/dynamic) filter to address these problems in a unified framework, and jointly leverage structural information from guidance and input images. SD filtering is formulated as a nonconvex optimization problem, which is solved by the majorization-minimization algorithm. The proposed algorithm converges quickly while guaranteeing a local minimum. The SD filter effectively controls the underlying image structure at different scales, and can handle a variety of types of data from different sensors. It is robust to outliers and other artifacts such as gradient reversal and global intensity shift, and has good edge-preserving smoothing properties. We demonstrate the flexibility and effectiveness of the proposed SD filter in a variety of applications, including depth upsampling, scale-space filtering, texture removal, flash/non-flash denoising, and RGB/NIR denoising.
TL;DR: This work presents a novel method for accurate and efficient upsampling of sparse depth data, guided by high-resolution imagery that determines globally consistent solutions and preserves fine details and sharp depth boundaries.
Abstract: We present a novel method for accurate and efficient upsampling of sparse depth data, guided by high-resolution imagery. Our approach goes beyond the use of intensity cues only and additionally exploits object boundary cues through structured edge detection and semantic scene labeling for guidance. Both cues are combined within a geodesic distance measure that allows for boundary-preserving depth interpolation while utilizing local context. We model the observed scene structure by locally planar elements and formulate the upsampling task as a global energy minimization problem. Our method determines globally consistent solutions and preserves fine details and sharp depth boundaries. In our experiments on several public datasets at different levels of application, we demonstrate superior performance of our approach over the state-of-the-art, even for very sparse measurements.
TL;DR: SegNet as mentioned in this paper uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling, which eliminates the need for learning to upsample.
Abstract: We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation termed SegNet. This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the 13 convolutional layers in the VGG16 network [1] . The role of the decoder network is to map the low resolution encoder feature maps to full input resolution feature maps for pixel-wise classification. The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input feature map(s). Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to perform non-linear upsampling. This eliminates the need for learning to upsample. The upsampled maps are sparse and are then convolved with trainable filters to produce dense feature maps. We compare our proposed architecture with the widely adopted FCN [2] and also with the well known DeepLab-LargeFOV [3] , DeconvNet [4] architectures. This comparison reveals the memory versus accuracy trade-off involved in achieving good segmentation performance. SegNet was primarily motivated by scene understanding applications. Hence, it is designed to be efficient both in terms of memory and computational time during inference. It is also significantly smaller in the number of trainable parameters than other competing architectures and can be trained end-to-end using stochastic gradient descent. We also performed a controlled benchmark of SegNet and other architectures on both road scenes and SUN RGB-D indoor scene segmentation tasks. These quantitative assessments show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared to other architectures. We also provide a Caffe implementation of SegNet and a web demo at http://mi.eng.cam.ac.uk/projects/segnet/ .
TL;DR: Quantitative and qualitative results from experiments on the KITTI Database, using LIDAR point clouds only, show very satisfactory performance of the approach introduced in this work, which relies on local spatial interpolation using sliding-window (mask) technique and the Bilateral Filter.
Abstract: High resolution depth-maps, obtained by upsampling sparse range data from a 3D-LIDAR, find applications in many fields ranging from sensory perception to semantic segmentation and object detection. Upsampling is often based on combining data from a monocular camera to compensate the low-resolution of a LIDAR. This paper, on the other hand, introduces a novel framework to obtain dense depth-map solely from a single LIDAR point cloud; which is a research direction that has been barely explored. The formulation behind the proposed depth-mapping process relies on local spatial interpolation, using sliding-window (mask) technique, and on the Bilateral Filter (BF) where the variable of interest, the distance from the sensor, is considered in the interpolation problem. In particular, the BF is conveniently modified to perform depth-map upsampling such that the edges (foreground-background discontinuities) are better preserved by means of a proposed method which influences the range-based weighting term. Other methods for spatial upsampling are discussed, evaluated and compared in terms of different error measures. This paper also researches the role of the mask's size in the performance of the implemented methods. Quantitative and qualitative results from experiments on the KITTI Database, using LIDAR point clouds only, show very satisfactory performance of the approach introduced in this work.
TL;DR: In this article, a convolutional neural network (CNN) is used to segment the human heart from 3D MRI data, where the loss function used to validate the CNN model may specifically account for missing data, which allows for use of a larger training set.
Abstract: Systems and methods for automated segmentation of anatomical structures, such as the human heart. The systems and methods employ convolutional neural networks (CNNs) to autonomously segment various parts of an anatomical structure represented by image data, such as 3D MRI data. The convolutional neural network utilizes two paths, a contracting path which includes convolution/pooling layers, and an expanding path which includes upsampling / convolution layers. The loss function used to validate the CNN model may specifically account for missing data, which allows for use of a larger training set. The CNN model may utilize multi-dimensional kernels (e.g., 2D, 3D, 4D, 6D), and may include various channels which encode spatial data, time data, flow data, etc. The systems and methods of the present disclosure also utilize CNNs to provide automated detection and display of landmarks in images of anatomical structures.
TL;DR: In this article, a method for accurate and efficient up-sampling of sparse depth data, guided by high-resolution imagery, is presented. But their approach goes beyond the use of intensity cues only and additionally exploits object boundary cues through structured edge detection and semantic scene labeling for guidance.
Abstract: We present a novel method for accurate and efficient up- sampling of sparse depth data, guided by high-resolution imagery. Our approach goes beyond the use of intensity cues only and additionally exploits object boundary cues through structured edge detection and semantic scene labeling for guidance. Both cues are combined within a geodesic distance measure that allows for boundary-preserving depth in- terpolation while utilizing local context. We model the observed scene structure by locally planar elements and formulate the upsampling task as a global energy minimization problem. Our method determines glob- ally consistent solutions and preserves fine details and sharp depth bound- aries. In our experiments on several public datasets at different levels of application, we demonstrate superior performance of our approach over the state-of-the-art, even for very sparse measurements.
TL;DR: This paper proposes luma aware chroma downsampling and upsampling algorithms to jointly improve the quality of the chroma image reconstruction, and explores the applicability of the proposed scheme in the scenario of screen content compression, targeting at improving the decoded chromaimage quality for display.
Abstract: Screen content images are originally captured in a full-chroma format. The chroma downsampling, which is commonly applied to the chroma component in screen content image representation and processing (e.g., YUV4:2:0 compression), will significantly degrade the image quality and create annoying artifacts such as blur and color shifting. To tackle this problem, in this paper we propose luma aware chroma downsampling and upsampling algorithms to jointly improve the quality of the chroma image reconstruction. Guided by the luma information, the chroma upsampling algorithm is proposed with the utilization of major color and index map representation. The geometric information-based linear mapping is developed to transfer the structure of luma to the interpolated chroma. Subsequently, the error sensitivity of the upsampling method is analyzed, and content dependent downsampling algorithm is presented to minimize the error sensitivity function. We further explore the applicability of the proposed scheme in the scenario of screen content compression, targeting at improving the decoded chroma image quality for display. Extensive experimental results demonstrate the viability and efficiency of the proposed scheme.
TL;DR: A deep interpretation of this framework that achieves state-of-the-art under such challenging scenarios as face hallucination and a new loss function for super-resolution that combines reconstruction error with a learned face quality measure in adversarial setting are presented.
Abstract: Face hallucination, which is the task of generating a high-resolution face image from a low-resolution input image, is a well-studied problem that is useful in widespread application areas. Face hallucination is particularly challenging when the input face resolution is very low (e.g., 10 x 12 pixels) and/or the image is captured in an uncontrolled setting with large pose and illumination variations. In this paper, we revisit the algorithm introduced in [1] and present a deep interpretation of this framework that achieves state-of-the-art under such challenging scenarios. In our deep network architecture the global and local constraints that define a face can be efficiently modeled and learned end-to-end using training data. Conceptually our network design can be partitioned into two sub-networks: the first one implements the holistic face reconstruction according to global constraints, and the second one enhances face-specific details and enforces local patch statistics. We optimize the deep network using a new loss function for super-resolution that combines reconstruction error with a learned face quality measure in adversarial setting, producing improved visual results. We conduct extensive experiments in both controlled and uncontrolled setups and show that our algorithm improves the state of the art both numerically and visually.
TL;DR: This work shows that increasing input image resolution (i.e. upsampling) offers up to 12 percentage-points higher accuracy compared to an off-the-shelf baseline, and finds situations where earlier/shallower layers of CNN provide higher accuracy than later/deeper layers.
Abstract: The ability to automatically detect other vehicles on the road is vital to the safety of partially-autonomous and fully-autonomous vehicles. Most of the high-accuracy techniques for this task are based on R-CNN or one of its faster variants. In the research community, much emphasis has been applied to using 3D vision or complex R-CNN variants to achieve higher accuracy. However, are there more straightforward modifications that could deliver higher accuracy? Yes. We show that increasing input image resolution (i.e. upsampling) offers up to 12 percentage-points higher accuracy compared to an off-the-shelf baseline. We also find situations where earlier/shallower layers of CNN provide higher accuracy than later/deeper layers. We further show that shallow models and upsampled images yield competitive accuracy. Our findings contrast with the current trend towards deeper and larger models to achieve high accuracy in domain specific detection tasks.
TL;DR: This extended guided filtering approach for depth map upsampling outperforms other state-of-the-art approaches by using a high-resolution color image as a guide and applying an onion-peeling filtering procedure that exploits local gradient information of depth images.
Abstract: The authors address the problem of depth map upsampling using a corresponding high-resolution color image. The depth map is captured by low-resolution time-of-flight cameras paired with a high-resolution RGB camera. Inspired by guided image filtering, the proposed method not only uses the structure of the high-resolution color image as guidance, it also exploits local gradient information of depth images to suppress potential texture-copying artifacts. In addition, the authors introduce onion-peel-order filtering that predicts depth values from outside inward in a concentric-layer order, which avoids depth bleeding during the propagation process. Quantitative and qualitative experimental results demonstrate the effectiveness and robustness of this approach over prior depth map upsampling methods.
TL;DR: A novel single anisotropic 3-D MR image upsampling method via sparse representation and overcomplete dictionary that is trained from in-plane high resolution slices to upsample in the out-of-plane dimensions that is more accurate than classical interpolation and does not require extra training sets.
Abstract: In magnetic resonance (MR), hardware limitation, scanning time, and patient comfort often result in the acquisition of anisotropic 3-D MR images. Enhancing image resolution is desired but has been very challenging in medical image processing. Super resolution reconstruction based on sparse representation and overcomplete dictionary has been lately employed to address this problem; however, these methods require extra training sets, which may not be always available. This paper proposes a novel single anisotropic 3-D MR image upsampling method via sparse representation and overcomplete dictionary that is trained from in-plane high resolution slices to upsample in the out-of-plane dimensions. The proposed method, therefore, does not require extra training sets. Abundant experiments, conducted on simulated and clinical brain MR images, show that the proposed method is more accurate than classical interpolation. When compared to a recent upsampling method based on the nonlocal means approach, the proposed method did not show improved results at low upsampling factors with simulated images, but generated comparable results with much better computational efficiency in clinical cases. Therefore, the proposed approach can be efficiently implemented and routinely used to upsample MR images in the out-of-planes views for radiologic assessment and postacquisition processing.
TL;DR: In this paper, the authors concatenate the global details and the smooth upsampled image into a tensor and apply a sequence of nonlinear convolutions to the tensor using a convolutional neural network to produce the upsampling image.
Abstract: A method upsamples an image using a non-linear fully connected neural network to produce only global details of an upsampled image and interpolates the image to produce a smooth upsampled image. The method concatenates the global details and the smooth upsampled image into a tensor and applies a sequence of nonlinear convolutions to the tensor using a convolutional neural network to produce the upsampled image.
TL;DR: A novel blind SR method is proposed to improve the spatial resolution of video sequences, while the overall point spread function of the imaging system, motion fields, and noise statistics are unknown.
Abstract: Super resolution (SR) for real-life video sequences is a challenging problem due to complex nature of the motion fields. In this paper, a novel blind SR method is proposed to improve the spatial resolution of video sequences, while the overall point spread function of the imaging system, motion fields, and noise statistics are unknown. To estimate the blur(s), first, a nonuniform interpolation SR method is utilized to upsample the frames, and then, the blur(s) is(are) estimated through a multi-scale process. The blur estimation process is initially performed on a few emphasized edges and gradually on more edges as the iterations continue. Also for faster convergence, the blur is estimated in the filter domain rather than the pixel domain. The high-resolution frames are estimated using a cost function that has the fidelity and regularization terms of type Huber–Markov random field to preserve edges and fine details. The fidelity term is adaptively weighted at each iteration using a masking operation to suppress artifacts due to inaccurate motions. Very promising results are obtained for real-life videos containing detailed structures, complex motions, fast-moving objects, deformable regions, or severe brightness changes. The proposed method outperforms the state of the art in all performed experiments through both subjective and objective evaluations. The results are available online at http://lyle.smu.edu/~rajand/Video_SR/ .
TL;DR: In this article, a convolutional neural network image super-resolution reconstruction method based on learning rate self-adaptation is proposed. But the method is limited to low-resolution images.
Abstract: The invention discloses a convolutional neural network image super-resolution rebuilding method based on learning rate self adaption. The method comprises the following steps of S1, performing fuzzy and downsampling on images in a high-resolution training image set to obtain a corresponding low-resolution training image set; S2, performing bi-cubic interpolation amplification on low-resolution images; S3, inputting the low-resolution images processed by the step S2 into the trained learning rate self adaption a convolutional neural network to obtain rebuilding high-resolution images. By using the technical scheme provided by the invention, the excellent super-resolution rebuilding performance is realized.
TL;DR: In this paper, a cost minimization problem to generate an output image from the input array is mapped onto regularly-spaced vertices in a multidimensional vertex space, based on an association between pixels of the reference image and the vertices.
Abstract: Example embodiments may allow for the efficient, edge-preserving filtering, upsampling, or other processing of image data with respect to a reference image. A cost- minimization problem to generate an output image from the input array is mapped onto regularly- spaced vertices in a multidimensional vertex space. This mapping is based on an association between pixels of the reference image and the vertices, and between elements of the input array and the pixels of the reference image. The problem is them solved to determine vertex disparity values for each of the vertices. Pixels of the output image can be determined based on determined vertex disparity values for respective one or more vertices associated with each of the pixels. This fast, efficient image processing method can be used to enable edge-preserving image upsampling, image colorization, semantic segmentation of image contents, image filtering or de-noising, or other applications.
TL;DR: A feature representation of local edges by means of a multileVEL filtering network, namely, multilevel modified finite Radon transform network (MMFRTN), is presented and Experimental results demonstrate the effectiveness of the proposed method over some state-of-the-art methods.
Abstract: A local line-like feature is the most important discriminate information in the image upsampling scenario. In recent example-based upsampling methods, grayscale and gradient features are often adopted to describe the local patches, but these simple features cannot accurately characterize complex patches. In this paper, we present a feature representation of local edges by means of a multilevel filtering network, namely, multilevel modified finite Radon transform network (MMFRTN). In the proposed MMFRTN, the MFRT is utilized in the filtering layer to extract the local line-like feature; the nonlinear layer is set to be a simple local binary process; for the feature-pooling layer, we concatenate the mapped patches as the feature of local patch. Then, we propose a new example-based upsampling method by means of the MMFRTN feature. Experimental results demonstrate the effectiveness of the proposed method over some state-of-the-art methods.
TL;DR: In this paper, a method for 3D point cloud registration is proposed, which includes generating a first upsampled three-dimensional point cloud by identifying at least one missing point in the 3D Point Cloud, determining an intensity of neighboring pixels, filling the missing point with a filler point using depth information from depth values in the point Cloud that correspond with the neighboring pixels.
Abstract: A method for three-dimensional point cloud registration includes generating a first upsampled three-dimensional point cloud by identifying at least one missing point in the three-dimensional point cloud, determining an intensity of neighboring pixels, filling the at least one missing point in the three-dimensional point cloud with a filler point using depth information from depth values in the three-dimensional point cloud that correspond with the neighboring pixels, generating a second upsampled three-dimensional point cloud by determining at least one local area of the first upsampled three-dimensional point cloud, determining entropies of pixels in the two-dimensional image that correspond with the at least one local area, adding at least one point to the at least one local area based on the entropies of pixels in the two-dimensional image and a scaled entropy threshold, and registering the second upsampled three-dimensional point cloud with a predetermined three-dimensional model.
TL;DR: This work proposes to enhance the resolution of dynamic depth videos with non-rigidly moving objects based on a new data model that uses densely upsampled, and cumulatively registered versions of the observed low resolution depth frames.
TL;DR: In this article, an image upsampling system, a training method and an up-sampling method are provided, the feature images of an image are obtained by using the convolutional network, upsam sampling processing is performed on the images with the muxer layer to synthesize every n×n feature images in the input signal into a feature image with the resolution amplified by n× n times, in the upsampled procedure with the Muxer Layer, information of respective feature images from the input signals is recorded in the generated feature image(s) without
Abstract: An image upsampling system, a training method thereof and an image upsampling method are provided, the feature images of an image are obtained by using the convolutional network, upsampling processing is performed on the images with the muxer layer to synthesize every n×n feature images in the input signal into a feature image with the resolution amplified by n×n times, in the upsampling procedure with the muxer layer, information of respective feature images in the input signal is recorded in the generated feature image(s) without loss; and thus, every time when the image passes through a muxer layer with an upsampling multiple of n, the image resolution can be increased by n×n times.
TL;DR: In this paper, a single image super-resolution reconstruction method combining depth learning and gradient transformation was proposed, where a cost function was reconstructed by using the input low-resolution image and the converted gradient as constraints, and a gradient descent method was used to optimize the reconstructed cost function to acquire a final output high resolution image.
Abstract: The invention discloses a single image super-resolution reconstruction method combining depth learning and gradient transformation. The method comprises the steps that a super-resolution method based on depth learning is used to carry out upsampling on an input low-resolution image to acquire an upsampling image; a gradient operator is used to carry out gradient-extracting on the upsampling image; a depth convolutional neural network is used to convert extracted gradient; a cost function is reconstructed by using the input low-resolution image and the converted gradient as constraints; a gradient descent method is used to optimize the reconstructed cost function to acquire a final output high-resolution image. According to the single image super-resolution reconstruction method provided by the invention, the reconstructed image has a fine structure in the subjective visual effect, is free of artificial effect, and has a high objective evaluation parameter value. The invention provides the effective single image super-resolution reconstruction method.
TL;DR: In this article, an unbalanced data classification method based on adaptive upsampling is proposed, which includes the following steps of calculating the total of positive samples to be newly generated; calculating the probability density distribution for each positive sample by taking the Euclidean distance as the metric; determining the number of the new samples of the positive sample; generating a new positive sample and adding the newly generated positive sample points to an original unbalanced training set to make the positive and negative samples be same in number, namely, obtaining a new balance training set including n positive samples and n negative
Abstract: The invention relates to an unbalanced data classification method based on adaptive upsampling. The method includes the following steps of calculating the total of positive samples to be newly generated; calculating the probability density distribution for each positive sample by taking the Euclidean distance as the metric; determining the number of the new samples to be generated of the positive sample; generating a new positive sample and adding the newly generated positive sample points to an original unbalanced training set to make the positive and negative samples be same in number, namely, obtaining a new balance training set including n positive samples and n negative samples; and training the newly generated balance training set by means of an Adaboost algorithm and obtaining a final classification model after the iteration for T times. According to the invention, the classification performance of the unbalanced dataset is improved.
TL;DR: A novel architecture to conduct the equivalent of the deconvolution operation globally and acquire dense predictions is proposed, which leads to improved performance of state-of-the-art semantic segmentation models on the PASCAL VOC 2012 benchmark, reaching 74.0% mean IU accuracy on the test set.
Abstract: Semantic image segmentation is a principal problem in computer vision, where the aim is to correctly classify each individual pixel of an image into a semantic label. Its widespread use in many areas, including medical imaging and autonomous driving, has fostered extensive research in recent years. Empirical improvements in tackling this task have primarily been motivated by successful exploitation of Convolutional Neural Networks (CNNs) pre-trained for image classification and object recognition. However, the pixel-wise labelling with CNNs has its own unique challenges: (1) an accurate deconvolution, or upsampling, of low-resolution output into a higher-resolution segmentation mask and (2) an inclusion of global information, or context, within locally extracted features. To address these issues, we propose a novel architecture to conduct the equivalent of the deconvolution operation globally and acquire dense predictions. We demonstrate that it leads to improved performance of state-of-the-art semantic segmentation models on the PASCAL VOC 2012 benchmark, reaching 74.0% mean IU accuracy on the test set.
TL;DR: A method to filter speckle noise based on compressive sensing (CS) is proposed, a method that has been demonstrated recently to reconstruct images with a sampling inferior to the Nyquist rate and can be greatly decreased while preserving sharpness of the image.
Abstract: In holographic reconstruction, speckle noise is a serious factor that may degrade the image quality greatly. Several methods have been proposed, so far, to filter speckle from hologram reconstruction. The first approach is based on averaging several speckle patterns. The second solution is to apply a filter on the reconstructed image. In the first case, several holograms should be acquired, while compromise between speckle reduction and edge preservation is usually a challenge in the case of digital filtering. We propose a method to filter speckle noise based on compressive sensing (CS). CS is a method that has been demonstrated recently to reconstruct images with a sampling inferior to the Nyquist rate. By applying several times the CS algorithm on the hologram reconstruction with different initial downsampling, several versions of the same images can be reconstructed with slightly different speckle patterns. Then, speckle noise can be greatly decreased while preserving sharpness of the image. We demonstrate the effectiveness of our proposed method with simulations as well as with holograms acquired by phase-shifting method.
TL;DR: Li et al. as discussed by the authors proposed a self-guided residual interpolation method to estimate a high-resolution depth map from an input low resolution depth map, which can outperform state-of-the-art depth map upsampling algorithms.
Abstract: In this paper, we propose a simple and effective depth upsampling technique using self-guided residual interpolation. The original residual interpolation requires guidance information such as high-resolution RGB color image. However, self-guided residual interpolation requires only a single depth map. In the proposed algorithm, a tentative estimation of a high-resolution depth map is first generated from an input low-resolution depth map. Then, re-interpolation is applied to the residual domain, which is defined by differences between the input depth map and the tentative estimate. A precise high-resolution depth map is obtainable by interpolating in the residual domain. Experimental results demonstrate that our algorithm can outperform state-of-the-art depth map upsampling algorithms.