TL;DR: A learning-based approach to the problem of detecting objects in still, gray-scale images that makes use of a sparse, part-based representation is developed and a critical evaluation of the approach under the proposed standards is presented.
Abstract: We study the problem of detecting objects in still, gray-scale images. Our primary focus is the development of a learning-based approach to the problem that makes use of a sparse, part-based representation. A vocabulary of distinctive object parts is automatically constructed from a set of sample images of the object class of interest; images are then represented using parts from this vocabulary, together with spatial relations observed among the parts. Based on this representation, a learning algorithm is used to automatically learn to detect instances of the object class in new images. The approach can be applied to any object with distinguishable parts in a relatively fixed spatial configuration; it is evaluated here on difficult sets of real-world images containing side views of cars, and is seen to successfully detect objects in varying conditions amidst background clutter and mild occlusion. In evaluating object detection approaches, several important methodological issues arise that have not been satisfactorily addressed in the previous work. A secondary focus of this paper is to highlight these issues, and to develop rigorous evaluation standards for the object detection problem. A critical evaluation of our approach under the proposed standards is presented.
TL;DR: A novel approach to multiresolution signal-level image fusion is presented for accurately transferring visual information from any number of input image signals, into a single fused image without loss of information or the introduction of distortion.
Abstract: A novel approach to multiresolution signal-level image fusion is presented for accurately transferring visual information from any number of input image signals, into a single fused image without loss of information or the introduction of distortion. The proposed system uses a "fuse-then-decompose" technique realized through a novel, fusion/decomposition system architecture. In particular, information fusion is performed on a multiresolution gradient map representation domain of image signal information. At each resolution, input images are represented as gradient maps and combined to produce new, fused gradient maps. Fused gradient map signals are processed, using gradient filters derived from high-pass quadrature mirror filters to yield a fused multiresolution pyramid representation. The fused output image is obtained by applying, on the fused pyramid, a reconstruction process that is analogous to that of conventional discrete wavelet transform. This new gradient fusion significantly reduces the amount of distortion artefacts and the loss of contrast information usually observed in fused images obtained from conventional multiresolution fusion schemes. This is because fusion in the gradient map domain significantly improves the reliability of the feature selection and information fusion processes. Fusion performance is evaluated through informal visual inspection and subjective psychometric preference tests, as well as objective fusion performance measurements. Results clearly demonstrate the superiority of this new approach when compared to conventional fusion systems.
TL;DR: The aim of this paper is to build a minimum weight spanning tree (MST) of an image in order to find region borders quickly in a bottom-up ’stimulus-driven’ way based on local differences in a specific feature.
Abstract: The region’s internal properties (color, texture, ...) help to identify them and their external relations (adjacency, inclusion, ...) are used to build groups of regions having a particular consistent meaning in a more abstract context. Low-level cue image segmentation in a bottom-up way, cannot and should not produce a complete final “good” segmentation. We present a hierarchical partitioning of images using a pairwise similarity function on a graph-based representation of an image. The aim of this paper is to build a minimum weight spanning tree (MST) of an image in order to find region borders quickly in a bottom-up ’stimulus-driven’ way based on local differences in a specific feature.
TL;DR: This paper model the super-resolution image as a Markov random field (MRF) and a maximum a posteriori (MAP) estimation method is used to derive a cost function which is then optimized to recover the high-resolution field.
TL;DR: This paper applies a new multiple frame integration method to minimize variation of the background of the image, and presents a new iterative text line decomposition method and accurate text bounding boxes are extracted from candidate text areas.
Abstract: Captions in videos often play an important role in video information indexing and retrieval. In this paper, we present a novel video caption detection approach. We first apply a new multiple frame integration (MFI) method to minimize variation of the background of the image. A time-based minimum (or maximum) pixel value search is employed and a Sobel edge map is used to determine the mode of search. Then block-based text detection is performed, i.e., a small window is used to scan the image and classify as text or non-text, using Sobel edges as features. We use a two-level pyramid to detect various text sizes. Finally, we present a new iterative text line decomposition method and accurate text bounding boxes are extracted from candidate text areas. Experimental result shows that the proposed approach achieves high precision and recall.
TL;DR: A novel approach to multiresolution signal-level image fusion is presented for accurately transferring visual information from any number of input image signals, into a single fused image without loss of information or the introduction of distortion.
Abstract: A novel approach to multiresolution signal-level image fusion is presented for accurately transferring visual information from any number of input image signals, into a single fused image without loss of information or the introduction of distortion. The proposed system uses a "fuse-then-decompose" technique realized through a novel, fusion/decomposition system architecture. In particular, information fusion is performed on a multiresolution gradient map representation domain of image signal information. At each resolution, input images are represented as gradient maps and combined to produce new, fused gradient maps. Fused gradient map signals are processed, using gradient filters derived from high-pass quadrature mirror filters to yield a fused multiresolution pyramid representation. The fused output image is obtained by applying, on the fused pyramid, a reconstruction process that is analogous to that of conventional discrete wavelet transform. This new gradient fusion significantly reduces the amount of distortion artefacts and the loss of contrast information usually observed in fused images obtained from conventional multiresolution fusion schemes. This is because fusion in the gradient map domain significantly improves the reliability of the feature selection and information fusion processes. Fusion performance is evaluated through informal visual inspection and subjective psychometric preference tests, as well as objective fusion performance measurements. Results clearly demonstrate the superiority of this new approach when compared to conventional fusion systems.
TL;DR: The novel algorithm is applied to fuse charge-coupled device (CCD) and synthetic aperture radar (SAR) images, and the fusion result is compared with those of some other fusion methods through some performance evaluation measures for fusion effect.
Abstract: Based on the principle of pulse-coupled neural network (PCNN), a novel algorithm for multisensor image fusion is presented. Firstly a contrast pyramid decomposition of source images is performed, and then the contrast pyramids are used as the input of PCNN. The contrast is selected based on the number of output pulse of PCNN to realize image fusion. The novel algorithm utilizes the global feature of source images because PCNN has the global coupled and pulse synchronization characteristics. It accords with the physiological characteristic of human visual neural system. The novel algorithm is applied to fuse charge-coupled device (CCD) and synthetic aperture radar (SAR) images, and the fusion result is compared with those of some other fusion methods through some performance evaluation measures for fusion effect. Comparison results show that the novel fusion algorithm is effective.
TL;DR: In this article, a method for operating a medical imaging system is presented, which includes receiving an image data set of a region of interest in a first dimensional representation, reducing the dimensionality of the image data sets to a second dimensional representation.
Abstract: A method for operating a medical imaging system is provided. The method includes receiving an image data set of a region of interest in a first dimensional representation, reducing the dimensionality of the image data set to a second dimensional representation, selecting a feature of interest in the second dimensional representation, and generating an image of the selected feature in the first dimensional representation.
TL;DR: An overview is presented of object-based and image-based representations of objects by their interiors that achieve the goal of being able to answer both types of queries with one representation and without possibly having to examine every cell.
Abstract: An overview is presented of object-based and image-based representations of objects by their interiors. The representations are distinguished by the manner in which they can be used to answer two fundamental queries in database applications: (1) Feature query: given an object, determine its constituent cells (i.e., their locations in space). (2) Location query: given a cell (i.e., a location in space), determine the identity of the object (or objects) of which it is a member as well as the remaining constituent cells of the object (or objects). Regardless of the representation that is used, the generation of responses to the feature and location queries is facilitated by building an index (i.e., the result of a sort) either on the objects or on their locations in space, and implementing it using an access structure that correlates the objects with the locations. Assuming the presence of an access structure, implicit (i.e., image-based) representations are described that are good for finding the objects associated with a particular location or cell (i.e., the location query), while requiring that all cells be examined when determining the locations associated with a particular object (i.e., the feature query). In contrast, explicit (i.e., object-based) representations are good for the feature query, while requiring that all objects be examined when trying to respond to the location query. The goal is to be able to answer both types of queries with one representation and without possibly having to examine every cell. Representations are presented that achieve this goal by imposing containment hierarchies on either space (i.e., the cells in the space in which the objects are found), or objects. In the former case, space is aggregated into successively larger-sized chunks (i.e., blocks), while in the latter, objects are aggregated into successively larger groups (in terms of the number of objects that they contain). The former is applicable to image-based interior-based representations of which the space pyramid is an example. The latter is applicable to object-based interior-based representations of which the R-tree is an example. The actual mechanics of many of these representations are demonstrated in the VASCO JAVA applets found at http://www.cs.umd.edu/˜hjs/quadtree/index.html.
TL;DR: This work proposes an efficient, stochastic mosaic representation, in which the color of each pixel is represented with a mixture of Gaussians, and the parameters of the Gaussian mixture are learned on-the-fly.
Abstract: This work addresses the problem of mosaic construction from image sequences. We propose an incremental, globally consistent method in which each new frame of the sequence is aligned to the mosaic that has been constructed up to that time instant. We propose an efficient, stochastic mosaic representation, in which the color of each pixel is represented with a mixture of Gaussians. The parameters of the Gaussian mixture are learned on-the-fly. The proposed method is particularly suited for long image sequences, since it prevents accumulation of small alignment errors and maintains constant memory requirements. Results for synthetic and real image sequences are presented.
TL;DR: This paper proposes an image watermarking scheme based on steerable pyramid transform to embed invisible and robust watermark and it has been confirmed by experiments and comparisons with many existing non-blind techniques that the watermark information embedded by the proposed technique is robust to JPEG compression, additive noise, and median filtering.
TL;DR: Application of TPR in a selected area of theAstrocytoma enabled us to observe the morphology and spatial distribution of neoplastic astrocytic nuclei, which encircled an adjacent blood vessel.
Abstract: Summary
This paper describes a new methodology for three-dimensional (3D) representation of biological structures contained in a series of sections, using an illustrative example. Spatial reconstruction of a specific area of an astrocytoma biopsy was carried out with alignment of the serial sections at an accuracy of 0.01% (or 1 µm cm−1), using the truncated pyramid representation (TPR) methodology. TPR includes: (a) serial tissue sectioning in a ribbon form; (b) alignment of the serial sections based on the properties of a ‘truncated pyramid’; (c) identification and localization of structures in every section using a field frame, and representation of the contours of the structures in every section as topographic contours (charting); (d) artificial reconstruction of the missing space between serial sections, by drawing intermediate contours based on the prototype contours of successive sections in order to provide smoother and more elegant representation of the volumes (complementation); and (e) 3D reconstruction. Application of TPR in a selected area of the astrocytoma enabled us to observe the morphology and spatial distribution of neoplastic astrocytic nuclei, which encircled an adjacent blood vessel.
TL;DR: In this paper, a 3D extension of the steerable pyramid is proposed to analyze volumes with a desired number of filters, which can be steered to any orientation fixed by the user, and synthesized using a limited number of basis filters.
Abstract: The object of this work is 3D directional structures detection. The detection is based on steerable filters, which can be steered to any orientation fixed by the user, and are synthesized using a limited number of basis filters. These filters are used in a recursive multi-scale transform: the steerable pyramid. 2D multiscale approaches
using oriented filters have proved to be efficient to detect such curvilinear patterns. We develop a 3D extension of the steerable pyramid to analyze volumes with a desired number of filters.
TL;DR: This paper introduces a surface relaxation operator that allows to build a non-uniform subdivision for a low computational cost and generalizes the relaxation operator to attributes such as color, texture, temperature, etc.
Abstract: The concept of multiresolution analysis applied to irregular meshes has become more and more important. Previous contributions proposed a variety of methods using simplification and/or subdivision algorithms to build a mesh pyramid. In this paper, we propose a multiresolution analysis framework for irregular meshes with attributes. Our framework is based on simplification and subdivision algorithms to build a mesh pyramid. We introduce a surface relaxation operator that allows to build a non-uniform subdivision for a low computational cost. Furthermore, we generalize the relaxation operator to attributes such as color, texture, temperature, etc. The attribute analysis gives more information on the analysed models allowing more complete processing. We show the efficiency of our framework through a number of applications including filtering, denoising and adaptive simplification.
TL;DR: Experimental results show that the proposed method outperforms conventional image fusion methods.
Abstract: In this paper, a novel image fusion method based on the expectation maximization (EM) algorithm and steerable pyramid is proposed. The registered images are first decomposed by using steerable pyramid. The EM algorithm is used to fuse the image components in the low frequency band. The selection method involving the informative importance measure is applied to those in the high frequency band. The final fused image is then computed by taking the inverse transform on the composite coefficient representations. Experimental results show that the proposed method outperforms conventional image fusion methods.
TL;DR: It is shown empirically that independent component analysis is able to capture some intuitive natural image categories when applied on histograms of outputs of ordinary Gabor-like filters, indicating that maximizing the independence or sparseness of features may be a meaningful strategy even on higher levels of image processing.
Abstract: Statistical methods, such as independent component analysis, have been successful in learning local low-level features from natural image data. Here we extend these methods for learning high-level representations of whole images or scenes. We show empirically that independent component analysis is able to capture some intuitive natural image categories when applied on histograms of outputs of ordinary Gabor-like filters. This can be taken as an indication that maximizing the independence or sparseness of features may be a meaningful strategy even on higher levels of image processing, for such advanced functionality as object recognition or image retrieval from databases.
TL;DR: The performance of a sensor, with and without modulation, working under these conditions, is analyzed with two approaches: in laboratory experiments with a pyramid wavefront sensor system working in open- and closed compensation and through numerical simulations.
Abstract: The use of a pyramid wavefront sensor without any kind of
modulating device, dynamical or statical, is a tempting idea that
is being considered in the actual design of some wavefront sensing
systems However, such a system has not yet been fully studied, as
for the effect of static non-common path aberrations, which in an
extreme case would leave the system working in a saturated regime
Here we analyze the performance of a sensor, with and without
modulation, working under these conditions, with two approaches:
In laboratory experiments with a pyramid wavefront sensor system
working in open- and closed compensation and through numerical
simulations
TL;DR: In this paper, an edge and texture data synthesized pyramid image fusion method was proposed, where the linear relations between binomial Gauss filter and texture extraction filter as well as edge extraction filter were adopted to find the corresponding coefficient of the edge and image images, then utilizing these features of each scale image to express each layer of the decomposed image, and finally make fusion by the fusion policy based on similarity measure and remarkability measure.
Abstract: The invention is an edge- and texture- data synthesized pyramid image fusion method, firstly establishing a pyramid structure based on edge and texture features, considering the linear relations between binomial Gauss filter and texture extraction filter as well as edge extraction filter, adopting singular value decomposition to find the corresponding coefficient of the edge and texture images, then utilizing these features of each scale image to express each layer of the decomposed image, and finally make fusion by the fusion policy based on similarity measure and remarkability measure. It largely improves the quality of the image by a large margin and has an important significance and practical value in the follow-up treatment and image display of various application systems.
TL;DR: In this paper, the authors compared the performance of the pyramid sensor with the Shack-Hartmann wavefront sensor in an open-loop fashion, and showed that the Pyramid sensor provides a significantly better wavefront estimate than the Shih-Hauer sensor.
Abstract: The Shack-Hartmann wavefront sensor operates by subdividing the complex field in the aperture plane of the telescope with a lenslet array and forming low resolution images of the object. An alternative wavefront sensing scheme can be derived from placing a lenslet array at the focal plane of the aperture and forming low resolution images of the aperture. This arrangement can be viewed as the generalisation of the pyramid sensor and enables direct comparisons of the pyramid sensor with the Shack-Hartmann sensor. In particular, in this paper the performance of the reconstructor of the two sensors is investigated. Simulation results demonstrate that the lenslet array at the focal plane has equivalent performance to the Shack-Hartmann sensor in open loop when no modulation is applied to the lenslet array. However, when the array is modulated in a manner akin to the pyramid sensor, subdivision at the focal plane provides a significantly better wavefront estimate than the Shack-Hartmann sensor.
TL;DR: In this article, a sky-ground representation is proposed for describing scenes of environment at a local place, where the acquired spherical image is divided into two parts, sky part and ground part, along the horizon according to their positions, above or below the horizon.
Abstract: This paper proposes a new representation, called a sky-ground representation, for describing scenes of environment at a local place. A sky-ground representation is a spherical image with full field of view, combined with a vertical reference which is determined by sensing the direction of gravity. The scene of environment at a local place is observed by a spherical image sensor. The acquired spherical image is divided into two parts, sky part and ground part, along the horizon according to their positions, above or below the horizon. The horizon is determined by sensing the gravity from an acceleration sensor.
TL;DR: An overview of the history and theory for multiresolution image fusion approaches was presented, and several typical algorihtms, including the pyramid transform, the wavelet transform and wavelet frame transform, and the fusion rules of form classical pixel-based schemes to region- based schemes were discussed and analyzed.
Abstract: An overview of the history and theory for multiresolution image fusion approaches was presented First, the principle and development of the multiresolution image fusion approaches was given Then, several typical algorihtms, including the pyramid transform, the wavelet transform and wavelet frame transform, and the fusion rules of form classical pixel-based schemes to region-based schemes were discussed and analyzed Finally, the disadvantages and development trend of the multiresolution image fusion approaches were presented according to multiresolution decompositions, fusion rules and evaluation criteria of the fusion results
Abstract: One possibility of increasing the achievable sky coverage of an adaptive optics system compensating the optical aberrations due to atmospheric turbulence for astronomical observations is sensing the wavefront at near-infrared wavelengths, where many bright stars are found, which can be used as guide stars and have no visible counterparts. A pyramid wavefront sensor was chosen due to its advantages over the Shack-Hartmann sensor. It is expected to achieve a gain in terms of sensitivity, raising the limiting magnitude, when used in closed-loop regime. In this work the possibility of building such an instrument has been studied in the framework of a project called PYRAMIR, which will implement a new wavefront sensor in the adaptive optics system at the Calar Alto 3.5m telescope. An analytical model for the way in which atmospheric turbulence increases the linear range of this sensor at the cost of lower sensitivity, as usually is done through a mechanical modulation of the light beam, has been presented. Studies at the telescope, in the laboratory and through simulations show the possibility of using the pyramid wavefront sensor without any extra modulation. An experimental laboratory setup and numerical simulations of a full adaptive optics system were the main tools for establishing the optical requirements for the new instrument. Issues like the pyramid requirements and specifications, the effects of modulation and non-common path aberrations and spatial filters and their effects on the sensor have been analyzed in this way. The results were then directly applied in the design of PYRAMIR.
TL;DR: An original solution for shape representation is proposed which relies on a spatial partitioning approach and selects a discrete set of reference points with respect to which a relationship matrix is computed, accounting for the spatial distribution of shape pixels.
Abstract: An original solution for shape representation is proposed which relies on a spatial partitioning approach. The representation selects a discrete set of reference points with respect to which a relationship matrix is computed, accounting for the spatial distribution of shape pixels. This is accomplished at different levels of resolution by a tree based representation. Depending on the number of points, coarse to fine region and boundary shape information are captured. Properties of the representation are discussed and assessed through an experimental evaluation on a set of sample shapes.
TL;DR: A new algorithm for realistic-looking face reconstruction is presented based on stereo vision and Experimental result shows that the proposed algorithm is robust and the 3D model is photo-realistic.
Abstract: 3D human face model reconstruction is essential to the generation of facial animations that is widely used in the field of virtual reality (VR). The main issues of 3D facial model reconstruction based on images by vision technologies are in twofold: one is to select and match the corresponding features of face from two images with minimal interaction and the other is to generate the realistic-looking human face model. In this paper, a new algorithm for realistic-looking face reconstruction is presented based on stereo vision. Firstly, a pattern is printed and attached to a planar surface for camera calibration, and corners generation and corners matching between two images are performed by integrating modified image pyramid Lucas-Kanade (PLK) algorithm and local adjustment algorithm, and then 3D coordinates of corners are obtained by 3D reconstruction. Individual face model is generated by the deformation of general 3D model and interpolation of the features. Finally, realistic-looking human face model is obtained after texture mapping and eyes modeling. In addition, some application examples in the field of VR are given. Experimental result shows that the proposed algorithm is robust and the 3D model is photo-realistic.
TL;DR: An algorithm for feature point extraction is presented based on a scale-space representation of the image as well as a system for tracking across scales, which produces stable and well-localized feature points estimates two essential properties for video applications.
Abstract: An algorithm for feature point extraction is presented. It is based on a scale-space representation of the image as well as a system for tracking across scales. Using synthetic and real images, it is shown that the proposed algorithm produces stable and well-localized feature points estimates two essential properties for video applications.
TL;DR: This paper shows that the emphasis degree is changed by changing the band-width of the Gaussian filter in order to improve the performance of the enlargement method based on Laplacian pyramid representation.
Abstract: The Laplacian pyramid is the hierarchical expression. Based on Laplacian pyramid representation, the prediction of unknown higher-frequency components is equivalent to the prediction of an unknown high-resolution Laplacian image. Gaussian filter is used for calculating of the Laplacian pyramid. And the band-width of the Gaussian filter is optimal for image compression. However, we cannot assert the band-width is optimal for the image enlargement method. In this paper, we change the band-width of the Gaussian filter in order to improve the performance of the enlargement method based on Laplacian pyramid representation. We show that the emphasis degree is changed by changing the band-width of the Gaussian filter.
TL;DR: Methods to fast extract dense disparity map using the mean normalized cross correlation are researched and fast cross correlation calculation, data structure optimizing and pyramid image-matching strategy are adopted to realize fast dense disparitymap extracting.
Abstract: Extracting disparity map as a very difficult problem is the core of stereovision research. Methods to fast extract dense disparity map using the mean normalized cross correlation are researched. In the stereo matching process, fast cross correlation calculation, data structure optimizing and pyramid image-matching strategy are adopted to realize fast dense disparity map extracting.
TL;DR: In this paper, a novel image fusion method based on the expectation maximization (EM) algorithm and steerable pyramid is proposed to fuse the image components in the low frequency band.
Abstract: In this paper, a novel image fusion method based on the expectation maximization (EM) algorithm andsteerable pyramid is proposed. The registered images are first decomposed by using steerable pyramid.The EM algorithm is used to fuse the image components in the low frequency band. The selection methodinvolving the informative importance measure is applied to those in the high frequency band. The finalfused image is then computed by taking the inverse transform on the composite coefficient representations.Experimental results show that the proposed method outperforms conventional image fusion methods.
TL;DR: This papers presents an application of interval analysis to a 3D reconstruction problem to build a partial boundaries representation of an object including guaranteed information according to the camera model using intervals.
Abstract: This papers presents an application of interval analysis to a 3D reconstruction problem. The aim is to build a partial boundaries representation of an object including guaranteed information according to the camera model. Features points coordinates are described using intervals. This representation is used together with a method to search stereo correspondence based on the connectivity of segments.