TL;DR: A novel texton based representation is developed, which is suited to modeling this joint neighborhood distribution for MRFs, and it is demonstrated that textures can be classified using the joint distribution of intensity values over extremely compact neighborhoods.
Abstract: We question the role that large scale filter banks have traditionally played in texture classification. It is demonstrated that textures can be classified using the joint distribution of intensity values over extremely compact neighborhoods (starting from as small as 3 /spl times/ 3 pixels square), and that this outperforms classification using filter banks with large support. We develop a novel texton based representation, which is suited to modeling this joint neighborhood distribution for MRFs. The representation is learnt from training images, and then used to classify novel images (with unknown viewpoint and lighting) into texture classes. The power of the method is demonstrated by classifying over 2800 images of all 61 textures present in the Columbia-Utrecht database. The classification performance surpasses that of recent state-of-the-art filter bank based classifiers such as Leung & Malik, Cula & Dana, and Varma & Zisserman.
TL;DR: The deployment of a DII system in addition to a 3–5 μm IR system through image fusion can increase the performance of human observers when the colour mapping relates to the nature of the visual task and the conditions (scene content) at hand.
TL;DR: In this paper, an improved method for creating high quality virtual reality panoramas is disclosed that yields dramatic improvements during the authoring and projecting cycles, with speeds up to several orders of magnitude faster than prior systems.
Abstract: An improved apparatus and method for creating high quality virtual reality panoramas is disclosed that yields dramatic improvements during the authoring and projecting cycles, with speeds up to several orders of magnitude faster than prior systems. In a preferred embodiment, a series of rectilinear images taken from a plurality of rows are pairwise registered with one another, and locally optimized using a pairwise objective function (local error function) that minimizes certain parameters in a projective transformation, using an improved iterative procedure. The local error function values for the pairwise registrations are then saved and used to construct a quadratic surface to approximate a global optimization function (global error function). The chain rule is used to avoid the direct evaluation of the global objective function, saving computation. In one embodiment concerning the blending aspect of the present invention, an improved procedure is described that relies on Laplacian and Gaussian pyramids, using a blend mask whose boundaries are determined by the grassfire transform. An improved iterative procedure is disclosed for the blending that also determines at what level of the pyramid to perform blending, and results in low frequency image components being blended over a wider region and high frequency components being blended over a narrower region. Human interaction and input is also provided to allow manual projective registration, initial calibration and feedback in the selection of photos and convergence of the system.
TL;DR: In this paper, a data management system and method for processing, storing, and viewing the extremely large imagery data that is rapidly produced by a linear-array-based microscope slide scanner is provided.
Abstract: A data management system and method for processing, storing, and viewing the extremely large imagery data that is rapidly produced by a linear-array-based microscope slide scanner is provided. The system receives, processes, and stores imagery data produced by the linear-array-based microscope slide scanner at approximately 3 GB per minute. The data are received as a series of overlapping image stripes and combined into a seamless and contiguous baseline image. The baseline image is logically mapped into a plurality of regions that are individually addressed to facilitate viewing and manipulation of the baseline image. The data management system enables imagery data compression while scanning and capturing new image stripes. This advantageously eliminates the overhead associated with storing uncompressed image stripes. The image compression also creates intermediate level images, thereby organizing the baseline image into a variable level pyramid structure referred to as a virtual slide. The data management system efficiently converts image stripes into a high quality virtual slide that allows rapid panning and zooming by image viewing software in accordance with the individually addressed regions. The virtual slide also allows efficient processing by an algorithm framework. The data management system is costs effective and scaleable, employs standard image file formats and supports the use of virtual slides in desirable applications such as telemedicine, telepathology, microscopy education, and the analysis of high value specimens such as tissue arrays.
TL;DR: In this article, a rank constraint of corresponding points in two views is used to measure the similarity between trajectories, and a dynamic programming approach is proposed to find the nonlinear time-warping function for videos containing human activities.
Abstract: In this paper, we propose a novel method to establish temporal correspondence between the frames of two videos. 3D epipolar geometry is used to eliminate the distortion generated by the projection from 3D to 2D. Although the fundamental matrix contains the extrinsic property of the projective geometry between views, it is sensitive to noise. Therefore, we propose the use of a rank constraint of corresponding points in two views to measure the similarity between trajectories. This rank constraint shows more robustness and avoids computation of the fundamental matrix. A dynamic programming approach using the similarity measurement is proposed to find the nonlinear time-warping function for videos containing human activities. In this way, videos of different individuals taken at different times and from distinct viewpoints can be synchronized. A temporal pyramid of trajectories is applied to improve the accuracy of the view-invariant dynamic time-warping approach. We show various applications of this approach such as video synthesis, human action recognition, and computer aider training. Compared to state-of-the-art techniques, our method shows a great improvement.
TL;DR: The paper investigates the possibilities of reusing available features for visible images by analyzing the different properties of infrared images and visible images and proposes the following novel features: special projection feature for segmentation, and two-axis pixel-distribution feature for classification.
Abstract: In order to improve the safety of night driving, automatic pedestrian detection has received more and more attraction. Since reliability is the most important issue in these systems, multi-dimensional-feature-based segmentation and classification needs to be introduced, and each axis should be efficient and be as much independent (to each other) as possible. To choose effective multi-dimensional features for infrared-image-based detection, the paper first investigates the possibilities of reusing available features for visible images by analyzing the different properties of infrared images and visible images. To take advantage of unique properties of infrared images, we propose the following novel features: special projection feature for segmentation, and two-axis pixel-distribution feature for classification. The segmentation based on new features does not depend on many assumptions and is shape-independent, thus avoiding brute-force multiple templates and multi-scale pyramid searching. The novel classification features include histogram feature and inertial feature that are independent and complimentary, thus the two-dimensional fusion-based classification significantly improves detection accuracy. These proposed features are independent from conventional pixel-array feature, and can be further fused with other general pedestrian detection features to improve simplicity, speed, and reliability.
TL;DR: A contextual classification strategy for object recognition in remote sensing images in an attempt to solve recognition tasks operatively using the use of soft neural classification based on the multilayer perceptron model.
Abstract: Many cases of remote sensing classification present complicated patterns that cannot be identified on the basis of spectral data alone, but require contextual methods that base class discrimination on the spatial relationships between the individual pixel and local and global configurations of neighboring pixels. However, the use of contextual classification is still limited by critical issues, such as complexity and problem dependency. We propose here a contextual classification strategy for object recognition in remote sensing images in an attempt to solve recognition tasks operatively. The salient characteristics of the strategy are the definition of a multiresolution feature extraction procedure exploiting human perception and the use of soft neural classification based on the multilayer perceptron model. Three experiments were conducted to evaluate the performance of the methodology, one in an easily controlled domain using synthetic images, the other two in real domains involving builtup pattern recognition in panchromatic aerial photographs and high-resolution satellite images.
TL;DR: This paper describes a trajectory design method based on the predicted target state error covariance that uses a pyramid, breadth-first search algorithm to generate real-time paths that achieve a minimum uncertainty bound in fixed time or a desired uncertainty Bound in minimum time.
Abstract: The performance of monocular vision based target tracking is a strong function of camera motion. Without motion, the target estimation problem is unsolvable. By designing the camera path, the best possible estimator performance can be achieved. This paper describes a trajectory design method based on the predicted target state error covariance. This method uses a pyramid, breadth-first search algorithm to generate real-time paths that achieve a minimum uncertainty bound in fixed time or a desired uncertainty bound in minimum time.
TL;DR: A focusing strategy from coarse-to-fine scales which leads to an improvement in the accuracy of the registration process of an automatic 3D non-rigid registration method in a multiscale framework is introduced.
Abstract: In this paper, we embed the minimization scheme of an automatic 3D non-rigid registration method in a multiscale framework. The initial model formulation was expressed as a robust multiresolution and multigrid minimization scheme. At the finest level of the multiresolution pyramid, we introduce a focusing strategy from coarse-to-fine scales which leads to an improvement in the accuracy of the registration process. A focusing strategy has been tested for a linear and a non-linear scale-space. Results on real 3D ultrasound images are discussed.
TL;DR: A novel method for intra-frame image processing, which is applicable to a wide variety of medical imaging modalities, like X-ray angiography,X-ray fluoroscopy, magnetic resonance, or ultrasound, and allowing a real-time implementation on standard hardware is presented.
Abstract: We present a novel method for intra-frame image processing, which is applicable to a wide variety of medical imaging modalities, like X-ray angiography, X-ray fluoroscopy, magnetic resonance, or ultrasound. The method allows to reduce noise significantly - by about 4.5 dB and more - while preserving sharp image details. Moreover, selective amplification of image details is possible. The algorithm is based on a multi-resolution approach. Noise reduction is achieved by non-linear adaptive filtering of the individual band pass layers of the multi-resolution pyramid. The adaptivity is controlled by image gradients calculated from the next coarser layer of the multi-resolution pyramid representation, thus exploiting cross-scale dependencies. At sites with strong gradients, filtering is performed only perpendicular to the gradient, i.e. along edges or lines. The multi-resolution approach processes each detail on its appropriate scale so that also for low frequency noise small filter kernels are applied, thus limiting computational costs and allowing a real-time implementation on standard hardware. In addition, gradient norms are used to distinguish smoothly between “structure” and “noise only” areas, and to perform additional noise reduction and edge enhancement by selectively attenuating or amplifying the corresponding band pass coefficients.
TL;DR: This work proposes to map an irregular pyramid on the agents' society so global constraints can be guaranteed and presents a global analysis of the system through a set of measures compared to an opinion poll in human society.
TL;DR: In this article, the performance of the first-light AO system of LBT is evaluated in terms of obtained Strehl ratios in J-, H-, and K-band.
Abstract: This presentation reports the numerical simulations we have done in order to evaluate the performance of the first-light AO system of LBT. The simulation tool used for this purpose is the Software Package CAOS, applicable for a wide range of AO systems and for which a brief recall of the main features is made. The whole process of atmospheric propagation of light, wavefront sensing (using a complete model of the pyramid wavefront sensor), wavefront reconstruction using the LBT672 adaptive secondary mirror modes), and closing of the loop, is simulated. The results are given in terms of obtained Strehl ratios in J-, H-, and K-band. Estimation of the resulting sky-coverage in K-band for different regions of the sky are also expressed. A comparison with the performance that would be obtained by using a Shack-Hartmann sensor is presented, confirming the gain achievable with the pyramid sensor.
TL;DR: In this article, a method for encoding an image into an image code-stream was proposed, which generates a reduced resolution representation of the image and then encodes the reduced resolution representations in accordance with a multi-resolution format to form an encoded reduced-resolution representation.
Abstract: A method (300) of encoding an image into an image code-stream. The method (300) generates a reduced resolution representation of the image and encodes the reduced resolution representation in accordance with a multi-resolution format to form an encoded reduced resolution representation of the image. The encoded reduced resolution representation is embedded into a first portion of the image code-stream and a compressed representation of the image is encoded into a further portion of the image code-stream.
TL;DR: A probabilistic network model over image spaces and its broad utility in mammographic image analysis is demonstrated, particularly with respect to computer-aided diagnosis and qualitative assessment of model structure through mammographic synthesis.
TL;DR: This investigation will address the differences between additive fusion and feature-level image fusion techniques for enhancing the driver's overall situational awareness.
Abstract: The Night Vision & Electronic Sensors Directorate (NVESD) has conducted a series of image fusion evaluations under the Head-Tracked Vision System (HTVS) program. The HTVS is a driving system for both wheeled and tracked military vehicles, wherein dual-waveband sensors are directed in a more natural head-slewed imaging mode. The HTVS consists of thermal and image-intensified TV sensors, a high-speed gimbal, a head-mounted display, and a head tracker. A series of NVESD field tests over the past two years has investigated the degree to which additive (A+B) image fusion of these sensors enhances overall driving performance. Additive fusion employs a single (but user adjustable) fractional weighting for all the features of each sensor's image. More recently, NVESD and Sarnoff Corporation have begun a cooperative effort to evaluate and refine Sarnoff's "feature-level" multi-resolution (pyramid) algorithms for image fusion. This approach employs digital processing techniques to select at each image point only the sensor with the strongest features, and to utilize only those features to reconstruct the fused video image. This selection process is performed simultaneously at multiple scales of the image, which are combined to form the reconstructed fused image. All image fusion techniques attempt to combine the "best of both sensors" in a single image. Typically, thermal sensors are better for detecting military threats and targets, while image-intensified sensors provide more natural scene cues and detect cultural lighting. This investigation will address the differences between additive fusion and feature-level image fusion techniques for enhancing the driver's overall situational awareness.
TL;DR: In this paper, a method of processing an image to form an image pyramid having multiple image levels is proposed, where a base level image comprising pixel values at pixel locations arranged in rows and columns is received, and the pixel values of the next level image are interpolated using an interpolation filter at the sample locations.
Abstract: A method of processing an image to form an image pyramid having multiple image levels includes receiving a base level image comprising pixel values at pixel locations arranged in rows and columns; determining sample locations for a next level image in the pyramid such that the sample locations are arranged in a regular pattern and the sample locations exceed the range of the pixel locations of the base level image; determining the pixel values of the next level image by interpolating the pixel values of the base level image using an interpolation filter at the sample locations; and treating the next level image as the base level image and repeating steps of determining sample locations and pixel values until a predetermined number of pyramid image levels are generated, or until a predetermined condition is met.
TL;DR: The morphological pyramid concept is presented, a new approach which can be used for both image analysis and fusion in remote sensing, and the principle of spatial and temporal fusion of remotely sensed images is shown.
Abstract: This paper presents the morphological pyramid concept, a new approach which can be used for both image analysis and fusion in remote sensing.
After a review on multi-sources images fusion methods in Earth observation from space, the morphological pyramid is presented. Its properties are described and then the principle of spatial and temporal fusion of remotely sensed images is shown through two examples. The first one deals with multi-date images, the second one combines in an original way two types of images coming from two sensors onboard the same satellite SPOT 4: the HRVIR one provides high resolution images (10 or 20 m) but coarse temporal frequency, and the VEGETATION sensor provides high frequency images with coarse spatial resolution (1 km). Finally, exploitation and analysis of merged images through the pyramid are shown.
TL;DR: A scale-invariant distance measure is proposed for comparing two image representations in terms of multi-scale features and the concept of a feature likelihood map, which is a function normalised to the interval [0, 1], and that approximates the likelihood of image features at all points in scale-space is proposed.
Abstract: This paper presents two approaches for evaluating multi-scale feature-based object models. Within the first approach, a scale-invariant distance measure is proposed for comparing two image representations in terms of multi-scale features. Based on this measure, the maximisation of the likelihood of parameterised feature models allows for simultaneous model selection and parameter estimation.
The idea of the second approach is to avoid an explicit feature extraction step and to evaluate models using a function defined directly from the image data. For this purpose, we propose the concept of a feature likelihood map, which is a function normalised to the interval [0, 1], and that approximates the likelihood of image features at all points in scale-space.
To illustrate the applicability of both methods, we consider the area of hand gesture analysis and show how the proposed evaluation schemes can be integrated within a particle filtering approach for performing simultaneous tracking and recognition of hand models under variations in the position, orientation, size and posture of the hand. The experiments demonstrate the feasibility of the approach, and that real time performance can be obtained by pyramid implementations of the proposed concepts.
TL;DR: This paper presents a method for detection of homogeneous regions in grey-scale images, representing them as blobs, which is non-linear, since it employs robust estimation rather than averaging to move through scale-space.
Abstract: This paper presents a method for detection of homogeneous regions in grey-scale images, representing them as blobs. In order to be fast, and not to favour one scale over others, the method uses a scale pyramid. In contrast to most multi-scale methods this one is non-linear, since it employs robust estimation rather than averaging to move through scale-space. This has the advantage that adjacent and partially overlapping clusters only affect each other's shape, not each other's values. It even allows blobs within blobs, to provide a pyramid blob structure of the image.
TL;DR: In this article, a base signal is recursively decomposed for a desired number of pyramid levels and a modified signal from the lowest level is modified to generate a preprocessed signal.
Abstract: A method, system, and software are disclosed for improving the quality of a signal. A base signal is recursively decomposed for a desired number of pyramid levels. The decomposed signal from the lowest level is modified to generate a preprocessed signal. The preprocessed signal from the lowest level is used to improve signal components or characteristics of the decomposed signal of the next higher level of the pyramidal decomposition, resulting in a modified signal at the next higher level. In one embodiment, the preprocessed signal includes filter mask that is used to guide a filtering process on the decomposed signal of the next higher level. In another embodiment, the preprocessed signal includes a up-predicted signal that is combined with the decomposed signal of the next higher level. The preprocessed signal from a lower level is used to generate a modified signal at a higher level. The generation of a preprocessed signal and a modified signal is recursively repeated for each level until the highest level of the pyramidal decomposition is reached, resulting in an improved base signal. The present invention finds particular application in photography and digital film processing, whereby the illustrated method may be used to reduce image noise, thereby improving image quality.
TL;DR: In this paper, a method of multi-resolution with gradient-adaptive filtering (MRGAF) of X-ray images in real-time was proposed, where a resolution into a Laplacian pyramid (L0,... L3) and a Gaussian pyramid (G0, G3) is carried out up to the K-th stage.
Abstract: The invention relates to a method of multi-resolution with gradient-adaptive filtering (MRGAF) of X-ray images in real time. For an image strip of 2K adjacent rows, a resolution into a Laplacian pyramid (L0, ... L3) and a Gaussian pyramid (G0, ... G3) is carried out up to the K-th stage. By limiting a processing operation to such a strip, it is possible to keep all relevant data ready in a local memory with rapid access (cache). A further acceleration compared to the conventional algorithm is achieved by calculating the gradient (D) from the Gaussian pyramid representations. By virtue of these and other optimization measures, it is possible to increase a multi-resolution with gradient-adaptive filtering to a processing rate of more than thirty (768 × 564) images per second.
TL;DR: An undecimated directional filter bank derived from the DFB originally proposed by Bamberger and Smith is proposed, which has excellent orientation selectivity, and maintains the low computational complexity from its predecessor.
Abstract: An undecimated directional filter bank (UDFB) derived from the DFB originally proposed by Bamberger and Smith is proposed. The new UDFB has excellent orientation selectivity, and maintains the low computational complexity from its predecessor. While over complete, the UDFB presents other properties like shift invariance, desirable for some image analysis applications. Using ladder structures the overall directional response is readily controlled by 1D prototypes which are easy to design. We combine the UDFB with over complete pyramidal structures to form directional pyramids with excellent radial and directional selectivity.
TL;DR: An algorithm for the rigid-body registration of a 3D CT to a set of C-arm images by matching them to computed cone-beam projections of the CT (DRRs) is developed and achieves an accuracy with a mean and a standard deviation of approximately 2.0±1.0 mm.
Abstract: We have developed an algorithm for the rigid-body registration of a 3D CT to a set of C-arm images by matching them to computed cone-beam projections of the CT (DRRs). We precomputed rescaled versions (pyramid) of the CT volume and of the C-arm images. We perform the registration of the CT to the C-arm images starting from their coarsest resolution until we reach some finer resolution that offers a good compromise between time and accuracy. To achieve precision, we use a cubic-spline data model to compute the data pyramids, the DRRs, and the gradient and the Hessian of the cost function. We validate our algorithm on a 3D CT and on C-arm images of a cadaver spine using fiducial markers. When registering the CT to two C-arm images, our algorithm operates safely if the angle between the two image planes is larger than 10°. It achieves an accuracy of approximately 2.0±1.0 mm.
TL;DR: An FIR structure to handle the computation along the borders using symmetry extension, a new BlockRam configuration for multi ports shift register, and a mathematical approach to predict and reduce the error dynamic range due to wordlength rounding are proposed.
Abstract: This paper gives a design framework for the implementation ofthe 2-D Orthogonal Discrete Wavelet Transform (DWT) onFPGA. The architecture is based on the Pyramid AlgorithmAnalysis. It maps spatially the multistage filter banks of theDWT on Xilinx Virtex-e FPGA family using on chip buffering.The architecture takes advantage from the low rate of the hightransform stages to reuse the logic. In this paper, we proposea novel FIR structure to handle the computation along theborders using symmetry extension, a new BlockRamconfiguration for multi ports shift register, and a newmathematical approach to predict and reduce the errordynamic range due to wordlength rounding. For an MxMimage size input, our architecture has a period of M2 clockcycles, and requires the minimum storage size. Thearchitecture is highly scalable for different filter lengths andnumber of octaves. The implementation results for a specific2-D Daubechies-4 Wavelet transform are included.
TL;DR: To reduce degradation in video quality, the proposed algorithm performs motion estimation and motion compensated frame rate up-conversion at each level of the Gaussian/Laplacian image pyramid.
Abstract: We present a hierarchical motion compensated frame rate conversion (HMC-FRC) algorithm based on the pyramid structure for high-quality video reconstruction. The conversion between images having different frame rates causes motion jitter and blurring near moving object boundaries. To reduce degradation in video quality, the proposed algorithm performs motion estimation (ME) and motion compensated frame rate up-conversion at each level of the Gaussian/Laplacian image pyramid. In experiments, the frame rate of the video sequence is up-converted by a factor of two. Experiments with several test sequences show the effectiveness of the proposed algorithm.
TL;DR: In this article, the texture source image differential representation is copied to a location corresponding to the identified modification region to generate a new differential representation for the modification region, which is integrated to produce a modified image.
Abstract: Techniques for modifying an image may be applied to heal texture areas within the image. A region to be healed in an original image may be identified, and a differential representation may be calculated for at least a portion of a texture source image that provides sample texture. Samples of the texture source image differential representation may be copied to a location corresponding to the identified modification region to generate a new differential representation for the modification region. The new differential representation for the modification region may be integrated to produce a modified image. In some implementations, a differential representation may be calculated of boundary pixels that are outside of and adjacent to the region to be healed in the original image. Copying samples of the texture source image differential representation may be performed so as to obtain substantial smoothness between the copied samples and the differential boundary pixel values.
TL;DR: In this article, the authors describe a technique for adjusting a differential representation of a source image by calculating the structural representation of the source image, and then generating a modified image from the modified differential representation by solving a Poisson differential equation.
Abstract: Methods and apparatus implementing systems and techniques for adjusting images. In general, in one implementation, the technique includes: receiving input defining an adjustment to be applied to a differential representation of a source image, calculating the differential representation of the source image, producing a structural representation of the source image, the structural representation corresponding to multiple types of contrast in the source image, modifying the differential representation based on the structural representation and the input defining the adjustment, and generating a modified image from the modified differential representation by solving a Poisson differential equation.
TL;DR: In this article, a method and system for creating 3D models of implant-bearing dental arches, and other anatomical fields of view, employs three-dimensional scanning means to capture images of an anatomical field of view wherein there have been positioned (and preferably affixed to an anatomical feature) one or more 3D recognition objects having a known geometry, such as a pyramid or a linked grouping of spheres.
Abstract: A method and system for creating three-dimensional models of implant-bearing dental arches (Fig. 4), and other anatomical fields of view, employs three-dimensional scanning means to capture images of an anatomical field of view wherein there have been positioned (and preferably affixed to an anatomical feature) one or more three-dimensional recognition objects having a known geometry, such as a pyramid or a linked grouping of spheres. Image processing software is employed to locate and orient said recognition objects as reference data for stitching multiple images and thereby reconstructing the scanned field of view. Recognition objects placed in areas of low feature definition enhance the accuracy of three-dimensional modeling of such areas.
TL;DR: This paper presents a set of hardware modules which form the basis for three vision applications: Target Tracking, Image Stabilization and Image Mosaicking, and shows the performance statistics for tracking more than one target using the basic modules.
Abstract: In this paper we present a set of hardware modules which form the basis for three vision applications: Target Tracking, Image Stabilization and Image Mosaicking. The two main modules are: the pyramidal module and the multiresolution correlation module. They were implemented using the Handle-C language, and tested in the Celoxica RC1000 development platform, which has a Virtex-E FPGA. We show the performance statistics for tracking more than one target using the basic modules, and present results of the applications implemented based on these basic modules.
TL;DR: Experiments demonstrated that a log-polar transformation was superior to both the image pyramid and the traditional method for separating a target from a distracting background, and comparatively enhanced the tracking performance.
Abstract: An active stereo-vision system enables a target object to be localized based on passing small disparities without heavy computation to identify the target. However, this simple method is not applicable to situations where a distracting background is included or the target and other objects are simultaneously located in the zero disparity area. Accordingly, to alleviate these problems, the current study combined filtering and foveation, which employs high resolution in the center of the visual field, while suppressing the periphery. An image pyramid and log-polar transformation are compared for the foveated image representation. The stereo disparity of the target is also extracted using projection to maintain a small stereo disparity during tracking. Experiments demonstrated that a log-polar transformation was superior to both the image pyramid and the traditional method for separating a target from a distracting background, and comparatively enhanced the tracking performance.