TL;DR: Find the secret to improve the quality of life by reading this adaptive blind signal and image processing and make the words as your good value to your life.
Abstract: Find the secret to improve the quality of life by reading this adaptive blind signal and image processing. This is a kind of book that you need now. Besides, it can be your favorite book to read after having this book. Do you ask why? Well, this is a book that has different characteristic with others. You may not need to know who the author is, how well-known the work is. As wise word, never judge the words from who speaks, but make the words as your good value to your life.
TL;DR: This work presents the first real-time multi-view face detection system which runs at 5 frames per second for 320/spl times/240 image sequence and trains by using a new meta booting learning algorithm.
Abstract: We present a detector-pyramid architecture for real-time multi-view face detection. Using a coarse to fine strategy, the full view is partitioned into finer and finer views. Each face detector in the pyramid detects faces of its respective view range. Its training is performed by using a new meta booting learning algorithm. This results in the first real-time multi-view face detection system which runs at 5 frames per second for 320/spl times/240 image sequence.
TL;DR: In this paper, the contourlet transform is proposed to satisfy the anisotropy scaling relation for curves, and thus offers a fast and structured curvelet-like decomposition.
Abstract: We propose a new scheme, named contourlet, that provides a flexible multiresolution, local and directional image expansion. The contourlet transform is realized efficiently via a double iterated filter bank structure. Furthermore, it can be designed to satisfy the anisotropy scaling relation for curves, and thus offers a fast and structured curvelet-like decomposition. As a result, the contourlet transform provides a sparse representation for two-dimensional piecewise smooth signals resembling images. Finally, we show some numerical experiments demonstrating the potential of contourlets in several image processing tasks.
TL;DR: Results show that the HPNN architecture, trained using the uncertain object position (UOP) error function, reduces the FP rate of a mammographic CAD system by approximately 50% without significant loss in sensitivity.
Abstract: This paper describes a pattern recognition architecture, which we term hierarchical pyramid/neural network (HPNN), that learns to exploit image structure at multiple resolutions for detecting clinically significant features in digital/digitized mammograms. The HPNN architecture consists of a hierarchy of neural networks, each network receiving feature inputs at a given scale as well as features constructed by networks lower in the hierarchy. Networks are trained using a novel error function for the supervised learning of image search/detection tasks when the position of the objects to be found is uncertain or ill defined. We have evaluated the HPNN's ability to eliminate false positive (FP) regions of interest generated by the University of Chicago's Computer-aided diagnosis (CAD) systems for microcalcification and mass detection. Results show that the HPNN architecture, trained using the uncertain object position (UOP) error function, reduces the FP rate of a mammographic CAD system by approximately 50% without significant loss in sensitivity. Investigation into the types of FPs that the HPNN eliminates suggests that the pattern recognizer is automatically learning and exploiting contextual information. Clinical utility is demonstrated through the evaluation of an integrated system in a clinical reader study. We conclude that the HPNN architecture learns contextual relationships between features at multiple scales and integrates these features for detecting microcalcifications; and breast masses.
TL;DR: A new non-uniform hierarchical scheme with the ability to handle different coarseness levels simultaneously is proposed, based on a maximum flow formulation, which allows a much better localization of object boundaries where large depth discontinuities are present.
Abstract: This paper addresses the stereo correspondence problem where the images are large enough to make stereo matching difficult. In order to reduce the problem size, we propose a new non-uniform hierarchical scheme with the ability to handle different coarseness levels simultaneously. Our framework, based on a maximum flow formulation, allows a much better localization of object boundaries where large depth discontinuities are present. The uniform decomposition fails to localize precisely such borders because it makes the assumption that surfaces are smooth in order to correct the errors from one coarseness level to the next. Our disparity estimation accurately localizes large depth discontinuities and then focus on increasing the resolution of smooth surfaces. Results on synthetic and real images demonstrate the validity of our framework.
TL;DR: A novel fusion algorithm is presented for multisensor images based on the discrete multiwavelet transform that can be performed at pixel level and may be used to combine images from multisensors to obtain a single composite with extended information content.
Abstract: The authors review the recent notion of multiwavelets and describe the use of the discrete multiwavelet transform (DMWT) in image fusion processing. Multiwavelets are extensions from scalar wavelets, and have several advantages in comparison with scalar wavelets. Multiwavelet analysis can offer more precise image analysis than wavelet multiresolution analysis. A novel fusion algorithm is presented for multisensor images based on the discrete multiwavelet transform that can be performed at pixel level. After the registering of source images, a pyramid for each source image can be obtained by applying decomposition with multiwavelets in each level. The multiwavelet decomposition coefficients of the input images are appropriately merged and a new fused image is obtained by reconstructing the fused multiwavelet coefficients. This image fusion algorithm may be used to combine images from multisensors to obtain a single composite with extended information content. The results of experiments indicate that this image fusion algorithm can provide a more satisfactory fusion outcome.
TL;DR: A method and system for creating 3D models of implant-bearing dental arches, and other anatomical fields of view, employs 3D scanning means to capture images of an anatomical field of view wherein there have been positioned (and preferably affixed to an anatomical feature) one or more three-dimensional recognition objects having a known geometry, such as a pyramid or a linked grouping of spheres as mentioned in this paper.
Abstract: A method and system for creating three-dimensional models of implant-bearing dental arches, and other anatomical fields of view, employs three-dimensional scanning means to capture images of an anatomical field of view wherein there have been positioned (and preferably affixed to an anatomical feature) one or more three-dimensional recognition objects having a known geometry, such as a pyramid or a linked grouping of spheres. Image processing software is employed to locate and orient said recognition objects as reference data for stitching multiple images and thereby reconstructing the scanned field of view. Recognition objects placed in areas of low feature definition enhance the accuracy of three-dimensional modeling of such areas.
TL;DR: The benefit of the multi-grid approach is the replacement of a large neighbourhood CAR model with a set of several simpler CAR models which are easy to synthesize and widen the application areas of these multigrid models capable of reproducing realistic textures for enhancing realism in various texture application areas.
Abstract: A fast recursive model-based algorithm for realistic colour texture synthesis is proposed The algorithm starts with a colour texture image decomposition into a multiresolution grid Each band pass colour factors are independently modelled by their dedicated 3D causal autoregressive random field models (CAR) We estimate an optimal contextual neighbourhood and parameters for each of the CAR submodel Finally, the synthesized multiresolution colour texture pyramid is collapsed into the required fine resolution colour texture The benefit of the multi-grid approach is the replacement of a large neighbourhood CAR model with a set of several simpler CAR models which are easy to synthesize and widen the application areas of these multigrid models capable of reproducing realistic textures for enhancing realism in various texture application areas
TL;DR: An image compression apparatus for compressing image data that includes luminance signal and chrominance signals includes an image judgement part that judges characterizing features of the image data on the basis of the luminance signals and/or chrominance signal.
Abstract: An image compression apparatus for compressing image data that includes luminance signal and chrominance signals includes an image judgement part that judges characterizing features of the image data on the basis of the luminance signal and/or chrominance signals, a subsampling rate setting part which sets the subsampling rate of the chrominance signals in accordance with the characterizing features thus judged, a subsampling processing part which performs subsampling processing of the image data at the set subsampling rate, and a compression encoding part which subjects the subsampling-processed image data to compression encoding.
TL;DR: An octave-band family of directional filter banks for images, capable of dividing the spatial frequency spectrum both angularly and radially using an efficient structure, with the properties of joint radialfrequency selectivity and pyramid oriented frequency selectivity.
Abstract: This paper introduces an octave-band family of directional filter banks for images, capable of dividing the spatial frequency spectrum both angularly and radially using an efficient structure. The new directional filter bank employs efficient separable analysis and synthesis filters with perfect reconstruction. Unique to the octave-band directional filter bank family are the properties of joint radial frequency selectivity and pyramid oriented frequency selectivity, all within a maximally decimated structure.
TL;DR: In this article, the rotational rate sensors and acceleration sensors are mounted on a sensor platform forming a three-sided truncated pyramid so that all possible three-way combinations of sensors are mutually linearly independent.
Abstract: A method and a system for detecting the spatial movement state of moving objects, e.g., vehicles. Due to a, for example, non-cartesian arrangement of four rotational rate sensors and/or acceleration sensors, it is also possible to obtain a redundant signal in addition to the desired useful signal indicating the spatial movement state, e.g., the rotational movement and/or acceleration in space; if this redundant signal is large enough in comparison with the rotational rate actually applied, it may be used for detection of the size of the error and the defective sensor. The four sensors are mounted, for example, on a sensor platform forming a three-sided truncated pyramid so that all possible three-way combinations of sensors are mutually linearly independent. The accuracy about the vertical axis is defined by the angle of inclination of the side faces of the truncated pyramid.
TL;DR: In this paper, a pyramid of image tiles is generated to represent a source image at different resolutions, where each level of the pyramid is generated by mapping a grouping of elements from the base image tile to an image tile at different levels of a pyramid.
Abstract: A technique generates a pyramid of image tiles to represent a source image at different resolutions. A base image tile stores, in a plurality of elements, an “on” state or an “off” state to represent the source image at a first resolution. Additional image tiles, with image resolutions lower than the resolution of the base image tile, are generated. The base image tile is divided into groupings of elements, such that each level of the pyramid of image tiles is generated by mapping a grouping of elements from the base image tile to an image tile at different levels of the pyramid. A threshold density of elements in the grouping elements is selected. If the grouping of elements in the base image tile for a level has a threshold density of “on” elements, the image data for the element in the current level is set to an “on” state. Conversely, the image data for the element is set to an “off” state if the threshold density of “on” elements in the base image tile grouping is less than the threshold density.
TL;DR: This paper presents an automatic two-dimensional image registration method by using a so-called steerable pyramid transform, which based on the steerability of the transform, image features along certain orientations can be obtained.
TL;DR: In this paper, it was shown that if the polynomial variables q and t of the Lawrence-Krammer representation are chosen to be appropriate algebraically independent unit complex numbers, then the form is negative-definite Hermitian.
Abstract: A non-singular sesquilinear form is constructed that is preserved by the Lawrence–Krammer representation. It is shown that if the polynomial variables q and t of the Lawrence–Krammer representation are chosen to be appropriate algebraically independent unit complex numbers, then the form is negative-definite Hermitian. Using the fact that non-invertible knots exist this result implies that there are matrices in the image of the Lawrence–Krammer representation that are conjugate in the unitary group, yet the braids that they correspond to are not conjugate as braids. The two primary tools involved in constructing the sesquilinear form are Bigelow's interpretation of the Lawrence–Krammer representation, together with the Morse theory of functions on manifolds with corners.
TL;DR: The paper describes the principle of image decomposition, the possibilities for recursive calculation, its basic characteristics and modifications, and some results of its modelling show the application capacities in image coding systems.
Abstract: This paper presents a method for pyramidal image decomposition called “inverse” because of the order followed to obtain the pyramid levels: from top to bottom, in correspondence with the requirement for “progressive” image transmission. The pyramid top (level zero) consists in selecting the low-frequency coefficients of the discrete cosine image transform. The following pyramid levels are made up of low-frequency discrete cosine transform (DCT) coefficients of the subimages obtained from quadtree division at each level. The quadtree root coincides with the pyramid top. The first level is the difference between the image and its approximation obtained by inverse DCT. The following (second) level is a difference too, between the previous (first) level and its approximation obtained with inverse DCT for every subimage in the first level, etc. The paper describes the principle of image decomposition, the possibilities for recursive calculation, its basic characteristics and modifications. The block diagram and the generalised scheme of the decomposition are given and some results of its modelling show the application capacities in image coding systems.
TL;DR: This work presents an efficient and automatic algorithm to create an adaptive image representation called SmartNail, which is defined as an appropriately cropped part of a suitably scaled-down image.
Abstract: To bridge the mismatch between the sizes of images and display devices, we present an efficient and automatic algorithm to create an adaptive image representation called SmartNail. Given a digital image and rectangular display frame smaller than the image, we define the SmartNail as an appropriately cropped part of a suitably scaled-down image. We choose the SmartNail-defining parameters - down-scaling factor and cropping location - to maximize a bit-allocation-based cost function that quantifies the visual importance of the image content in the SmartNail. For JPEG2000-encoded images, the SmartNail parameters can be determined using just the header information available in the encoded file. Hence, only the wavelet coefficients required to reconstruct the SmartNail need to be decoded from the entire JPEG2000 code stream. Consequently, the SmartNail construction requires minimal computation and memory requirements. Simulations demonstrate the effectiveness of the SmartNail representations.
TL;DR: A new image enlarging method is proposed, which can realize an arbitrary size enlarging based on a new Laplacian pyramid representation and the effectiveness of the propose method is shown through a lot of experimental results.
Abstract: Enlargement of digital images is one of the most basic processing. Image enlargement corresponds to narrowing the sampling interval of the digital image. Thus, it is necessary to estimate the unknown higher-frequency components in the enlargement process. Recently, several methods, which include the prediction of unknown higher-frequency components, are proposed. We have also proposed the image enlarging method based on the Laplacian pyramid representation. However, we can get only a two times enlarged image by these methods. In this paper, we propose a new image enlarging method, which can realize an arbitrary size enlarging based on a new Laplacian pyramid representation. The effectiveness of the propose method is shown through a lot of experimental results.
TL;DR: An image detecting device for pattern matching using a pyramid structure search is described in this paper, which consists of an input image storing portion for receiving image data from an external device to hold, a pattern image searching portion for storing a pattern images to be searched, an image processing portion connected to the image storing and a display portion for displaying groups of set parameter candidates that are calculated in the control portion as a table, and an input portion used to change set parameter values based on the groups of candidates displayed on the display portion.
Abstract: An image detecting device for executing a pattern matching using a pyramid structure search, comprises an input image storing portion for receiving image data from an external device to hold, a pattern image storing portion for storing a pattern image to be searched, an image processing portion connected to the input image storing portion and the pattern image storing portion, for reading respective image data to execute the pattern matching, a control portion for calculating set parameter candidates that are required to execute the pyramid structure search in the image processing portion, a display portion for displaying groups of set parameter candidates that are calculated in the control portion as a table, and an input portion used to change set parameter values based on the groups of set parameter candidates displayed on the display portion.
TL;DR: A neurally-inspired model of the primate visual motion system attempting to explain how a hierarchical feedforward network consisting of layers representing cortical areas V1, MT, MST, and 7a detects and classifies different kinds of motion patterns is proposed.
Abstract: The Selective Tuning Model is a proposal for modelling visual attention in primates and humans. Although supported by significant biological evidence, it is not without its weaknesses. The main one addressed by this paper is that the levels of representation on which it was previously demonstrated (spatial Gaussian pyramids) were not biologically plausible. The motion domain was chosen because enough is known about motion processing to enable a reasonable attempt at defining the feedforward pyramid. The effort is unique because it seems that no past model presents a motion hierarchy plus attention to motion. We propose a neurally-inspired model of the primate visual motion system attempting to explain how a hierarchical feedforward network consisting of layers representing cortical areas V1, MT, MST, and 7a detects and classifies different kinds of motion patterns. The STM model is then integrated into this hierarchy demonstrating that successfully attending to motion patterns, results in localization and labelling of those patterns.
TL;DR: An axiomatic framework for the notion of multiresolution connectivity on complete lattices is introduced by means of two equivalent notions: connectivity measures and connectivity pyramids.
Abstract: In this paper, we introduce an axiomatic framework for the notion of multiresolution connectivity on complete lattices. This framework extends the notion of connectivity classes, introduced by Serra in the late eighties. We introduce multiresolution connectivities by means of two equivalent notions: connectivity measures and connectivity pyramids. We present examples of multiresolution connectivities based on pyramids of dilations and of morphological sampling operators. We study the application of multiresolution connectivity to various image analysis tasks, such as pyramid decompositions, hierarchical segmentations, and multiresolution features.
TL;DR: An efficient wavelet-based multiresolution approach to the stereo vision problem is presented, using the theory of representation of operators in spaces spanned by scaling functions to take advantage of a simplified approximation of differentiation.
Abstract: An efficient wavelet-based multiresolution approach to the stereo vision problem is presented. A cost function is defined and iteratively minimized. The minimization is performed on the image representation in wavelet space. We employ the theory of representation of operators in spaces spanned by scaling functions and thereby take advantage of a simplified approximation of differentiation. Examples illustrate the advantages afforded by the application of our algorithm over correlation-based methods.
TL;DR: A novel technique for image enhancement based on a multiscale pyramid is presented, which generalizes the classical edge-oriented wavelet based enhancement technique by employing multiple wavelet features.
Abstract: A novel technique for image enhancement based on a multiscale pyramid is presented. It generalizes the classical edge-oriented wavelet based enhancement technique by employing multiple wavelet features. Visually relevant features are amplified in the wavelet domain according to a pointwise multiscale product. The use of complex in-phase/quadrature wavelets allows features, which are oriented any way, to be emphasized.
TL;DR: A new face detection algorithm based on the 1st-order reduced Coulomb energy (RCE) classifier that locates frontal views of human faces at any degree of rotation and scale in complex scenes is presented.
Abstract: We present a new face detection algorithm based on the 1st-order reduced Coulomb energy (RCE) classifier. The algorithm locates frontal views of human faces at any degree of rotation and scale in complex scenes. The face candidates and their orientations are first determined by computing the Hausdorff distance between a simple face abstraction model and binarized test windows in an image pyramid. Then, after normalizing the energy, each face candidate is verified by two subsequent classifiers; a binary image classifier and the 1st-order RCE classifier While the binary image classifier is employed as a pre-classifier to discard nonfaces with minimum computational complexity, the 1st-order RCE classifier is used as the main face classifier for final verification. An optimal training method to construct the representative face model database is also presented. Experimental results show that the proposed algorithm yields a high detection ratio, while yielding no false alarm.
TL;DR: In this article, a multi-faceted object representation is used to display different video images on each facet of the representation, and the images on the different facets can be selected to represent different aspects of a common theme, such as datacast information related to a primary source of information.
Abstract: The two dimensional sample rate conversion capabilities of a video display system are used to produce three-dimensional effects. Linear and non-linear scaling is applied to a video image to convey a sense of depth. The three dimensional effects are used to increase the visual appeal of existing and new feature sets in display systems. A multi-faceted object representation, such as a representation of a cube or a pyramid, can be used to display different video images on each facet of the representation. By appropriately scaling each image on each facet, an impression of depth is achieved. The images on the different facets can be selected to represent different aspects of a common theme, such as datacast information related to a primary source of information. Channel changing on a television can be presented as a rotation of the multifaceted object. In like manner, other familiar representations, such as a representation of a book can be used, wherein channel changing is presented as a turning of the pages of the book, each television program being presented on a different page. Advanced features, such as program categorization, can be represented as tabs on the book that facilitate the selection of a particular category. In like manner, a rotation of a multifaceted object about one axis may correspond to a change of channel within a select category, whereas a rotation about another axis may correspond to a change of category. Techniques are presented for achieving these three dimensional effects with calculations that are well suited for execution via the sample rate converters of conventional display systems.
TL;DR: It has been shown that maturity spots can be seen on images of fruit obtained by laser-induced fluorescence and the presented method is able to detect and to quantify these spots automatically.
Abstract: A new method of image processing, developed within the framework of the automatic evaluation of the state of maturity of fruit, is presented. It has been shown that maturity spots can be seen on images of fruit obtained by laser-induced fluorescence. The presented method is able to detect and to quantify these spots automatically. The initial image processing is provided by a watershed algorithm. In order to refine the obtained segmentation, a multiresolution method is developed. It is based on the representation of data with an adaptive pyramid of region adjacency graphs (RAG) associated with a multi-criteria approach. This approach allows regions to be merged and only the information corresponding to the maturity spot to be stored. Interesting experimental results have been obtained on a set of 20 images of apples.
TL;DR: This paper proposes a framework for texture classification through filtering where the filters are derived as the independent components of the input images and each texture is then characterized by the marginal distributions of its filter responses.
Abstract: In this paper we propose a framework for texture classification through filtering. Given a set of textures, the filters are derived as the independent components of the input images and each texture is then characterized by the marginal distributions of its filter responses. The marginal distributions provide a low-dimensional representation of images and result in a significant dimension reduction compared to the full joint distribution. When the components are independent, the dimension reduction does not incur any information loss for classification. The texture classification problem is posed as classifying the textures based on their marginal distributions. Preliminary results demonstrate significant improvement in texture classification performance.
TL;DR: This paper deals with performance improvement of robust PCA algorithms by replacing regular subsampling of images by an irregular image pyramid adapted to the expected image content.
Abstract: In this paper we deal with performance improvement of robust PCA algorithms by replacing regular subsampling of images by an irregular image pyramid adapted to the expected image content. The irregular pyramid is a structure built based on knowledge gained from the training set of images. It represents different regions of the image with different level of detail, depending on their importance for reconstruction. This strategy enables us to improve reconstruction results and therefore the recognition significantly. The training algorithm works on the data necessary to perform robust PCA and therefore requires no additional input.
TL;DR: A method for estimating optical flow by a generalization of the brightness constancy assumption to additive transparencies is described, and by a development to its second order, an extension of the optical flow constraint equation is obtained.
Abstract: Motion transparency phenomena in image sequences are frequent but classical methods of motion estimation are unable to deal with them. This paper describes a method for estimating optical flow by a generalization of the brightness constancy assumption to additive transparencies. This assumption is based on three successive images of a sequence. Thus, by a development to its second order, we obtain an extension of the optical flow constraint equation. The approach assumes that motion is translational on a region large enough in order to regularize the aperture problem. In the way of avoiding outliers, due to a non respect of the brightness constancy assumption, a robust multi-resolution method is used. It is composed of a low-pass pyramid and a M-estimator technique. This method offers some good results on artificial and natural image sequences.
TL;DR: The method for obtaining a compact speech-waveform representation based on frame theory allows an accurate and unambiguous decomposition into voiced and unvoiced components and it is particularly suitable for time scaling and pitch scaling.
Abstract: We describe a method for obtaining a compact speech-waveform representation based on frame theory (Mermelstein, P, 1973) In contrast to earlier frame-theory based representations, the frame used for the representation is continuously adapted to the signal, facilitating an accurate description of both stationary regions and of rapid transitions with a relatively low rate of coefficients (few parameters per second) The representation allows an accurate and unambiguous decomposition into voiced and unvoiced components and it is particularly suitable for time scaling and pitch scaling
TL;DR: A spectral representation is presented for appearance based image classification and object recognition based on a generative process, derived by partitioning the frequency domain into small disjoint regions and consisting of marginal distributions of those filter responses.
Abstract: We present a spectral representation for appearance based image classification and object recognition. Based on a generative process, the representation is derived by partitioning the frequency domain into small disjoint regions. This gives rise to a set of filters and a representation consisting of marginal distributions of those filter responses. We use a neural network, to learn a classifier through training examples. We propose a filter selection algorithm by maximizing the performance over training data. A distinct advantage of our representation is that it can be effectively used for different classification and recognition tasks, which is demonstrated by experiments and comparisons in texture classification, face recognition, and appearance-based 3D object recognition.