TL;DR: A representation is constructed which captures the underlying statistical distribution of features in the image texture as well as the variations in this distribution with viewing and illumination direction and is a compact representation and a recognition method where a single novel image of unknown viewing and illuminated direction can be classified efficiently.
Abstract: A bidirectional texture function (BTF) describes image texture as it varies with viewing and illumination direction. Many real world surfaces such as skin, fur, gravel, etc. exhibit fine-scale geometric surface detail. Accordingly, variations in appearance with viewing and illumination direction may be quite complex due to local foreshortening, masking and shadowing. Representations of surface texture that support robust recognition must account for these effects. We construct a representation which captures the underlying statistical distribution of features in the image texture as well as the variations in this distribution with viewing and illumination direction. The representation combines clustering to learn characteristic image features and principle components analysis to reduce the space of feature histograms. This representation is based on a core image set as determined by a quantitative evaluation of importance of individual images in the overall representation. The result is a compact representation and a recognition method where a single novel image of unknown viewing and illumination direction can be classified efficiently. The CUReT (Columbia-Utrecht reflectance and texture) database is used as a test set for evaluation of these methods.
TL;DR: A flexible multiscale and directional representation for images is proposed that combines directional filter banks with the Laplacian pyramid to provide a sparse representation for two-dimensional piecewise smooth signals resembling images.
Abstract: A flexible multiscale and directional representation for images is proposed. The scheme combines directional filter banks with the Laplacian pyramid to provide a sparse representation for two-dimensional piecewise smooth signals resembling images. The underlying expansion is a frame and can be designed to be a tight frame. Pyramidal directional filter banks provide an effective method to implement the digital curvelet transform. The regularity issue of the iterated filters in the directional filter bank is examined.
TL;DR: This paper presents a meta-modelling framework for Real-Time Computer Vision that automates the very labor-intensive and therefore time-heavy and expensive process of manually cataloging and annotating images.
Abstract: Preface. Contributing Authors. 1. Summation A. Rosenfeld. 2. Digital Geometry - The Birth of a New Discipline R. Klette. 3. Digital Topology T.Y. Kong. 4. Fuzzy Mathematics J.N. Mordeson. 5. Picture Languages A. Nakamura. 6. Parallel Image Processing A.Y. Wu. 7. Object Representations H. Samet. 8. Texture Classification and Segmentation R. Chellappa, B.S. Manjunath. 9. Edge Measures Using Similarity Regions M.K. Singh, N. Ahuja. 10. Relaxation Labeling: 25 Years and Still Iterating S.W. Zucker. 11. From a Robust Hierarchy to a Hierarchy of Robustness P. Meer. 12. A Pyramid Framework for Real-Time Computer Vision P.J. Burt. 13. On the Computational Modeling of Human Vision J. Beck. 14. Statistics Explains Geometrical Optical Illusions C. Fermuller, Y. Aloimonos. 15. Optics for OmniStereo Imaging Y. Pritch, et al. 16. Volumetric Scene Reconstruction from Multiple Views C.R. Dyer. Index.
TL;DR: The results presented correspond to the basic method without any refinement or combination with other techniques, suggesting that the approach may hold promise for future development.
TL;DR: An efficient content-based image retrieval (CBIR) system which employs the shape information of images to facilitate the retrieval process and it is shown that the image indexing method supports faster retrieval than other multi-dimensional indexing methods such as the R*-tree.
TL;DR: This work proposes to use recurrent neural networks for both analysis and synthesis of image reconstruction, which makes it possible to use partial results as context information to resolve ambiguities.
Abstract: Successful image reconstruction requires the recognition of a scene and the generation of a clean image of that scene. We propose to use recurrent neural networks for both analysis and synthesis. The networks have a hierarchical architecture that represents images in multiple scales with different degrees of abstraction. The mapping between these representations is mediated by a local connection structure. We supply the networks with degraded images and train them to reconstruct the originals iteratively. This iterative reconstruction makes it possible to use partial results as context information to resolve ambiguities. We demonstrate the power of the approach using three examples: superresolution, fill-in of occluded parts, and noise removal/contrast enhancement. We also reconstruct images from sequences of degraded images.
TL;DR: In this paper, an apparatus and method for generating and/or obtaining a three-dimensional representation from a two-dimensional image and, in particular, an apparatus for generating a 3D image from the 2D image was presented.
Abstract: The present invention pertains to an apparatus and method for generating and/or for obtaining a three-dimensional representation from a two-dimensional image and, in particular, to an apparatus and method for generating a three-dimensional image from the two-dimensional image.
TL;DR: A theoretical basis for a computationally efficient approach to content-adaptive mesh generation used for image representation is provided, which leads to an improved version of the algorithm.
Abstract: Previously, we proposed a computationally efficient approach to content-adaptive mesh generation used for image representation (see Lee, J. et al., IEEE Int. Conf. Image Proc., 2000). We now provide a theoretical basis for that method, which leads to an improved version of the algorithm. An error bound is derived for a mesh representation of an image based on the theory of function interpolation. From this result, a more accurate scheme is proposed for placement of mesh elements in the image domain according to the image content. Experimental results, compared to other methods, show that a highly accurate image representation can be obtained at extremely low computational cost by the proposed technique.
TL;DR: It is proposed that soft segmentation is a more natural way to segment digital image data than crisp segmentation and one method of deriving aSoft segmentation from a weighted linked pyramid algorithm is shown.
TL;DR: In this article, the generalized Laplacian pyramid is used to fuse multispectral data with high-resolution panchromatic images, and a decision based on thresholding the local CC is utilized to check the physical congruence of fusion, while the ratio of local RMSs between the two images provides a space-varying gain factor by which the injected highpass contribution is equalized.
Abstract: This work presents a general and formal solution to the problem of fusion of multispectral data with high-resolution panchromatic images. The method relies on the generalized Laplacian pyramid, which is an oversampled structure obtained by subtracting from an image its lowpass version, and selectively performs spatial-frequencies spectrum substitution from one image to another. The novelty of the present work is that a decision based on thresholding the local CC is utilized to check the physical, congruence of fusion, while the ratio of local RMSs between the two images provides a space-varying gain factor by which the injected highpass contribution is equalized. Since the pyramid decomposition is not critically-subsampled, possible impairments in the fused images, due to missing cancellation of aliasing terms, are avoided. Quantitative results are presented and discussed on simulated SPOT 5 data of an urban area (2.5 m P, 10 m XS) obtained from the MIVIS airborne imaging spectrometer.
TL;DR: In this article, the SPIHT algorithm is used to transform the original set of picture elements (pixels) of each group of frames into transform coefficients constituting a hierarchical pyramid in which a spatio-temporal orientation tree is formed with the pixels of the approximation subband resulting from the 3D wavelet transform.
Abstract: The invention relates to an encoding method for the compression of a video sequence including successive frames organized in groups of frames. Each frame is decomposed by means of a three-dimensional (3D) wavelet transform leading to a given number of successive resolution levels. This method is based on the SPIHT algorithm that transforms the original set of picture elements (pixels) of each group of frames into transform coefficients constituting a hierarchical pyramid in which a spatio-temporal orientation tree-in which the roots are formed with the pixels of the approximation subband resulting from the 3D wavelet transform and the offspring of each of these pixels is formed with the pixels of the higher subbands corresponding to the image volume defined by these root pixels-defines the spatio-temporal relationship. According to the invention, a full exploration of the subbands is performed during the initialization step of the process, and the set significance level of each subtree in the root pixels is calculated and stored. In the sorting step for the process, a comparison between said set significance level and the current significance level n replaces the call to the function that computes the significance of a tree relatively to n.
TL;DR: An improved morphological image representation that can be used for image compression, obtaining very high compression rates is presented.
Abstract: This article presents an improved morphological image representation that can be used for image compression, obtaining very high compression rates. The new image representation described in this work is called skeleton structure and is a natural extension of the morphologic structure. This article will present its theoretical background, introduce the new representation, and show some application examples.
TL;DR: Simulations show higher representation and generalization capability of the proposed networks comparing with the RBF and multilayer networks with sigmoid activation functions.
Abstract: A general form of multilayer RBF networks is introduced. Complete supervised training rules for parameters are also presented. To achieve global convergence we apply a global optimization algorithm called the magic-brush method. This network can be naturally extended into a pyramid topology. Simulations show higher representation and generalization capability of the proposed networks comparing with the RBF and multilayer networks with sigmoid activation functions.
TL;DR: Experimental results reveal that under the similar peak signal to noise ratio (PSNR) and bits per pixel (bpp), the proposed PIT scheme has a better feature-preserving capability when compared to the reduced-difference pyramid PIT scheme.
TL;DR: An improved search algorithm for vector quantization using mean pyramid structure and the range search approach is presented, which reduces search times and improves the previous result by Lee and Chen.
TL;DR: A focusing strategy from coarse-to-fine scales which leads to an improvement of the accuracy in the registration process of an automatic 3D non-rigid registration method in a multi-scale framework is introduced.
Abstract: In this paper, we embed the minimization scheme of an automatic 3D non-rigid registration method in a multi-scale framework. The initial model formulation was expressed as a robust multiresolution and multigrid minimization scheme. At the finest level of the multiresolution pyramid, we introduce a focusing strategy from coarse-to-fine scales which leads to an improvement of the accuracy in the registration process. A focusing strategy has been tested for a linear and a non-linear scale-space. Results on 3D Ultrasound images are discussed.
TL;DR: Fusion systems based on derivatives of Gaussian low-pass pyramid and the Discrete Wavelet transform are examined and their performances versus decomposition/selection parameters are defined and compared.
TL;DR: Starting from a binary digital image, a multi-valued pyramid is built and suitably treated, so that shape and topology properties of the pattern are preserved satisfactorily at all resolution level.
TL;DR: In this paper, a segmentation-free tree-structure image representation is presented and a back-propagation through structure (BPTS) algorithm is adopted in order to learn the structure representation.
Abstract: Much research on image analysis and processing has been carried out for the last few decades. However, it is still challenging to represent the image contents effectively and satisfactorily. In this paper, a segmentation-free tree-structure image representation is presented. In order to learn the structure representation, a back-propagation through structure (BPTS) algorithm is adopted. Experiments on plant image classification and retrieval refining using only six visual features were conducted on a plant image database and a natural scene image database, respectively. Encouraging results have been achieved.
TL;DR: An adaptive algorithm is presented for converting the quadtree representation of a binary image to its chain code representation by constructing the chain codes of the resulting quadtree of the Boolean operation of two quadtrees by re-using the original chain codes.
TL;DR: An image watermarking technique based on pyramid transforms that has high imperceptibility, good robustness, and accurate detection and can be applied to copyright notification, enforcement, and fingerprinting is proposed.
Abstract: An image watermarking technique based on pyramid transforms is proposed. An arbitrary binary pattern is formed into an effective hypothesized pattern and transmitted as a watermark. Multiresolution pyramid transforms are applied to host images, whose characteristics are exploited to embed the watermark. The detector is designed to be effective to a wide range of original signal sources and noise sources. The scheme is designed to achieve efficient trade-offs between perceptual invisibility, robustness and trustworthy detection. The experiments demonstrate that the proposed technique has high imperceptibility, good robustness, and accurate detection. It can be applied to copyright notification, enforcement, and fingerprinting.
TL;DR: An efficient feature representation and a novel method for the retrieval of images by quantizing each image adaptively, based on vector quantization are presented.
Abstract: A novel method for multispectral image retrieval is presented. This method uses a representation of image features based on vector quantization. Feature representation is important for image retrieval, but there are difficulties in applying conventional histogram-based representations to multispectral images. We developed an efficient feature representation and a novel method for the retrieval of images by quantizing each image adaptively.
TL;DR: To demonstrate the superiority of the multiresolution tracking algorithm in the connection to parallel computation, a scheme for mapping the tracking algorithm into a Transputer-based pyramidal parallel computing structure is proposed in the paper.
Abstract: This paper presents a multiresolution approach to visual motion tracking. In the approach, the foveation mechanism of the human visual system is used to model the multiresolution information perception algorithms of a Transputer-based pyramid visual tracking system. The video images of a moving target are transformed into pyramidal data structures, each of those images consists of multiple image layers with different resolutions by a Gaussian pyramid generation algorithm. The tracking of a moving target over an image sequence is accomplished by performing a foveal search that is based on an iterative intensity pattern correlation along the multiple resolution levels of the Gaussian pyramids of two successive images. Analyses are given as to the efficiency and accuracy of our tracking algorithm, showing that the algorithm is over 160 times faster than conventional mono-resolution tracking methods, with the tracking error within one pixel. To demonstrate the superiority of the multiresolution tracking algorithm in the connection to parallel computation, a scheme for mapping the tracking algorithm into a Transputer-based pyramidal parallel computing structure is proposed in the paper. Experimental results demonstrate good performance of the proposed approach.
TL;DR: In this article, a base signal is recursively decomposed and modified for a desired number of pyramid levels, at each level, the decomposed signal from the previous level is modified to improve one or more signal components or characteristics.
Abstract: A method, system, and software are disclosed for improving the quality of a signal. A base signal is recursively decomposed and modified for a desired number of pyramid levels. At each level, the decomposed signal from the previous level is modified to improve one or more signal components or characteristics. The modified signal from a given level is then decomposed for the next level of the pyramidal decomposition for each pyramid level. Starting at the second to last level of the pyramidal decomposition, the improved signal of the last pyramid level is recomposed and then combined with one or more signals from the current pyramid level, resulting in an improved signal for the current level. The recomposition and combination of the improved signal of the previous level occurs for each level until the top, or level 0, of the pyramidal decomposition is reached. The improved base signal may or may not be combined with the original base signal, depending on the desired outcome. The present invention finds particular application in photography and digital film processing, whereby the illustrated method may be used to improve image quality.
TL;DR: In this paper, a multiplierless pyramid filter is described comprising a sequence of scalable cascaded units, each of said units comprising a delay unit and three the adders, with the delay unit coupled to produce a higher order pyramidally filtered output signal sample stream and state variable sample stream.
Abstract: A multiplierless pyramid filter is described comprising a sequence of scalable cascaded units, each of said units comprising a delay unit and three the adders, said delay unit the adders being coupled to produce a higher order pyramidally filtered output signal sample stream and state variable sample stream from an input signal sample stream and a lower order pyramidally filtered output signal sample stream and state variable signal stream.
TL;DR: This work proposes an image representation and processing framework using a multiscale triangulation of the grayscale function, and demonstrates the approximation performance of the normal mesh representation through mathematical analyses for simple functions and simulations for real images.
Abstract: Multiresolution triangulation meshes are widely used in computer graphics for 3D modeling of shapes. We propose an image representation and processing framework using a multiscale triangulation of the grayscale function. Triangles have the potential of approximating edges better than the blocky structures of tensor-product wavelets. Among the many possible triangulation schemes, normal meshes are natural for efficiently representing singularities in image data, thanks to their adaptivity to the smoothness of the modeled image. Our non-linear, multiscale image decomposition algorithm, based on this subdivision scheme, takes edges into account in a way that is closely related to wedgelets and curvelets. The highly adaptive property of the normal mesh construction provides a very efficient representation of images, which potentially outperforms standard wavelet transforms. We demonstrate the approximation performance of the normal mesh representation through mathematical analyses for simple functions and simulations for real images.
TL;DR: A new generalisation of scale-space and pyramids, which combines statistical modelling with a spatial representation using the familiar concept of multiple resolutions, but applied to a Gaussian mixture representation of the image - hence the title MGMM.
Abstract: This paper introduces a new generalisation of scale-space and pyramids, which combines statistical modelling with a spatial representation. The representation uses the familiar concept of multiple resolutions, but applied to a Gaussian mixture representation of the image - hence the title MGMM. It is shown that MGMM can approximate any probability density and can adapt to smooth motions. After a presentation of the theory, it is shown how MGMM can be applied to the estimation of visual motion.
TL;DR: In this article, a method of compressing an image is described in which digital data signals in a 2D images are formed into an image data pyramid with a number of layers and each layer is processed to give a compressed encoding in an ordered list.
Abstract: A method of compressing an image is described in which digital data signals in a 2-dimensional images are formed into an image data pyramid with a number of layers and each layer is processed to give a compressed encoding in an ordered list. The encoding with the largest quality gain factor is selected first and added to a compressed representation of the data array. This is repeated for the next largest gain factor and so on until a predetermined maximum is reached. Each layer of the image data pyramid corresponds to different frequency bands, the vector quantizations of these layers will only minimally interfere with one another. This allows a simple ordering of all possible gain contributions made by the compressed encodings, to the compressed representation. This in turn allows a straightforward selection of the compressed encodings having the largest quality gain factors, for compiling the compressed representation of the image.
TL;DR: A block matching motion estimation algorithm whose computations are content complexity adaptive, made macroblock adaptive by dynamically varying the number of candidate motion vectors passed to lower levels, depending on the frequency characteristics of the macroblock being matched and the complexity in the sequence for such characteristics.
Abstract: Power consumption has emerged as an important constraint in the design of mobile video encoders. As motion estimation accounts for the majority of the total computations involved in video encoding, the algorithm and architecture used affect the quality and power levels of the final solution. In this paper, we present a block matching motion estimation algorithm whose computations are content complexity adaptive. The basic framework used is the multi-resolution mean pyramid technique. The algorithm is made macroblock adaptive by dynamically varying the number of candidate motion vectors passed to lower levels, depending on the frequency characteristics of the macroblock being matched and the complexity in the sequence for such characteristics. We use the concept of a deviation pyramid in order to estimate the macroblock frequency characteristics. Simulation results show that for typical videophony sequences, the algorithm reduces computational complexity by a factor ranging from 15.5 to 74.0, while maintaining PSNR values close to that obtained by using the full-search block matching algorithm. Simple operations are used in the algorithm to ensure applicability of the proposed algorithm for hardware implementation.