TL;DR: This work considers a standard non-spatial representation in which the frequencies but not the locations of quantized image features are used to discriminate between classes analogous to how words are used for text document classification without regard to their order of occurrence, and considers two spatial extensions.
Abstract: We investigate bag-of-visual-words (BOVW) approaches to land-use classification in high-resolution overhead imagery. We consider a standard non-spatial representation in which the frequencies but not the locations of quantized image features are used to discriminate between classes analogous to how words are used for text document classification without regard to their order of occurrence. We also consider two spatial extensions, the established spatial pyramid match kernel which considers the absolute spatial arrangement of the image features, as well as a novel method which we term the spatial co-occurrence kernel that considers the relative arrangement. These extensions are motivated by the importance of spatial structure in geographic data.The methods are evaluated using a large ground truth image dataset of 21 land-use classes. In addition to comparisons with standard approaches, we perform extensive evaluation of different configurations such as the size of the visual dictionaries used to derive the BOVW representations and the scale at which the spatial relationships are considered.We show that even though BOVW approaches do not necessarily perform better than the best standard approaches overall, they represent a robust alternative that is more effective for certain land-use classes. We also show that extending the BOVW approach with our proposed spatial co-occurrence kernel consistently improves performance.
TL;DR: This work shows that with an appropriate combination of kernels a significant boost in classification performance is possible, and indicates the utility of active learning with probabilistic predictive models, especially when the amount of training data labels that may be sought for a category is ultimately very small.
Abstract: Discriminative methods for visual object category recognition are typically non-probabilistic, predicting class labels but not directly providing an estimate of uncertainty. Gaussian Processes (GPs) provide a framework for deriving regression techniques with explicit uncertainty models; we show here how Gaussian Processes with covariance functions defined based on a Pyramid Match Kernel (PMK) can be used for probabilistic object category recognition. Our probabilistic formulation provides a principled way to learn hyperparameters, which we utilize to learn an optimal combination of multiple covariance functions. It also offers confidence estimates at test points, and naturally allows for an active learning paradigm in which points are optimally selected for interactive labeling. We show that with an appropriate combination of kernels a significant boost in classification performance is possible. Further, our experiments indicate the utility of active learning with probabilistic predictive models, especially when the amount of training data labels that may be sought for a category is ultimately very small.
TL;DR: It is found that the added difficulty of verification produced by age gaps becomes saturated after the gap is larger than four years, for gaps of up to ten years, and image quality and eyewear present more of a challenge than facial hair.
Abstract: Face verification in the presence of age progression is an important problem that has not been widely addressed. In this paper, we study the problem by designing and evaluating discriminative approaches. These directly tackle verification tasks without explicit age modeling, which is a hard problem by itself. First, we find that the gradient orientation, after discarding magnitude information, provides a simple but effective representation for this problem. This representation is further improved when hierarchical information is used, which results in the use of the gradient orientation pyramid (GOP). When combined with a support vector machine GOP demonstrates excellent performance in all our experiments, in comparison with seven different approaches including two commercial systems. Our experiments are conducted on the FGnet dataset and two large passport datasets, one of them being the largest ever reported for recognition tasks. Second, taking advantage of these datasets, we empirically study how age gaps and related issues (including image quality, spectacles, and facial hair) affect recognition algorithms. We found surprisingly that the added difficulty of verification produced by age gaps becomes saturated after the gap is larger than four years, for gaps of up to ten years. In addition, we find that image quality and eyewear present more of a challenge than facial hair.
TL;DR: This paper discusses the implementation of three categories of image fusion algorithms – the basic fusion algorithms, the pyramid based algorithms and the basic DWT algorithms, developed as an Image Fusion Toolkit - ImFus, using Visual C++ 6.0.
Abstract: Image Fusion is a process of combining the relevant information from a set of images, into a single image, wherein the resultant fused image will be more informative and complete than any of the input images. This paper discusses the implementation of three categories of image fusion algorithms – the basic fusion algorithms, the pyramid based algorithms and the basic DWT algorithms, developed as an Image Fusion Toolkit - ImFus, using Visual C++ 6.0. The objective of the paper is to assess the wide range of algorithms together, which is not found in the literature. The fused images were assessed using Structural Similarity Image Metric (SSIM) [10], Laplacian Mean Squared Error along with seven other simple image quality metrics that helped us measure the various image features; which were also implemented as part of the toolkit. The readings produced by the image quality metrics, based on the image quality of the fused images, were used to assess the algorithms. We used Pareto Optimization method to figure out the algorithm that consistently had the image quality metrics produce the best readings. An assessment of the quality of the fused images was additionally performed with the help of ten respondents based on their visual perception, to verify the results produced by the metric based assessment. Coincidentally, both the assessment methods matched in their raking of the algorithms. The Pareto Optimization method picked DWT with Haar fusion method as the one with the best image quality metrics readings. The result here was substantiated by the visual perception based method where it was inferred that fused images produced by DWT with Haar fusion method was marked the best 63.33% of times which was far better than any other algorithm. Both the methods also matched in assessing Morphological Pyramid method as producing fused images of inferior quality.
TL;DR: In this article, a genetic optimization based approach is proposed to find the optimum weights corresponding to each facial region for matching, the information obtained from different levels of Laplacian pyramid are combined to improve the identification accuracy.
Abstract: This paper presents an efficient algorithm for matching sketches with digital face images. The algorithm extracts discriminating information present in local facial regions at different levels of granularity. Both sketches and digital images are decomposed into multi-resolution pyramid to conserve high frequency information which forms the discriminating facial patterns. Extended uniform circular local binary pattern based descriptors use these patterns to form a unique signature of the face image. Further, for matching, a genetic optimization based approach is proposed to find the optimum weights corresponding to each facial region. The information obtained from different levels of Laplacian pyramid are combined to improve the identification accuracy. Experimental results on sketch-digital image pairs from the CUHK and IIIT-D databases show that the proposed algorithm can provide better identification performance compared to existing algorithms.
TL;DR: A novel approach for global target tracking based on mean shift technique is proposed, termed as adaptive pyramid mean shift, because it uses the pyramid analysis technique and can determine the pyramid level adaptively to decrease the number of iterations required to achieve convergence.
TL;DR: The paper presents novel Haarlet Pyramid based iris recognition technique, which is done using the image feature set extracted from Haar Wavelets at various levels of decomposition, and shows that Haarlets level-5 outperforms other Haarles.
Abstract: Iris recognition has been a fast growing, challenging and interesting area in real-time applications. A large number of iris recognition algorithms have been developed for decades. The paper presents novel Haarlet Pyramid based iris recognition technique. Here iris recognition is done using the image feature set extracted from Haar Wavelets at various levels of decomposition. Analysis was performed of the proposed method, consisting of the False Acceptance Rate and the Genuine Acceptance Rate. The proposed technique is tested on an iris image database having 384 images. The results show that Haarlets level-5 outperforms other Haarlets, because the higher level Haarlets are giving very fine texture features while the lower level Haarlets are representing very coarse texture features which are less useful for discrimination of images in iris recognition.
TL;DR: A novel algorithm that permits the fast and accurate computation of the Legendre image moments is introduced in this paper, based on the block representation of an image and on a new image representation scheme, the Image Slice Representation (ISR) method.
TL;DR: The experimental results on several pairs of multifocus images show that the proposed method can achieve good results and exhibit clear advantages over the gradient pyramid transform and discrete wavelet transform techniques.
Abstract: Image fusion is a process of integrating complementary information from multiple images of the same scene such that the resultant image contains a more accurate description of the scene than any of the individual source images. A method for fusion of multifocus images is presented. It combines the traditional pixel-level fusion with some aspects of feature-level fusion. First, multifocus images are decomposed using a redundant wavelet transform (RWT). Then the edge features are extracted to guide coefficient combination. Finally, the fused image is reconstructed by performing the inverse RWT. The experimental results on several pairs of multifocus images show that the proposed method can achieve good results and exhibit clear advantages over the gradient pyramid transform and discrete wavelet transform techniques.
TL;DR: In this paper, an automated, computerized method is provided for processing an image, which comprises the steps of converting a color band representation of the image to a homogeneous representation of spectral and spatial characteristics of a texture region in the image.
Abstract: In an exemplary embodiment of the present invention, an automated, computerized method is provided for processing an image. According to a feature of the present invention, the method comprises the steps of converting a color band representation of the image to a homogeneous representation of spectral and spatial characteristics of a texture region in the image and utilizing the homogeneous representation of spectral and spatial characteristics of a texture region in the image to identify homogeneous tokens in the image.
TL;DR: The resulting system is comparable with the state-of-the-art methods when evaluated on the challenging public PASCAL 2007 and 2009 datasets.
Abstract: This paper presents a new object representation, Active Mask Hierarchies (AMH), for object detection. In this representation, an object is described using a mixture of hierarchical trees where the nodes represent the object and its parts in pyramid form. To account for shape variations at a range of scales, a dictionary of masks with varied shape patterns are attached to the nodes at different layers. The shape masks are "active" in that they enable parts to move with different displacements. The masks in this active hierarchy are associated with histograms of words (HOWs) and oriented gradients (HOGs) to enable rich appearance representation of both structured (eg, cat face) and textured (eg, cat body) image regions. Learning the hierarchical model is a latent SVM problem which can be solved by the incremental concave-convex procedure (iCCCP). The resulting system is comparable with the state-of-the-art methods when evaluated on the challenging public PASCAL 2007 and 2009 datasets.
TL;DR: A probabilistic model for supervised dictionary learning (SDLM) is proposed which seamlessly combines an unsupervised model (a Gaussian Mixture Model) and a supervised model ( a logistic regression model) in a Probabilistic framework and is extended to incorporate spatial information during the dictionary learning process in a spatial pyramid matching like manner.
Abstract: Dictionary generation is a core technique of the bag-of-visual-words (BOV) models when applied to image categorization. Most of previous approaches generate dictionaries by unsupervised clustering techniques, e.g. k-means. However, the features obtained by such kind of dictionaries may not be optimal for image classification. In this paper, we propose a probabilistic model for supervised dictionary learning (SDLM) which seamlessly combines an unsuper-vised model (a Gaussian Mixture Model) and a supervised model (a logistic regression model) in a probabilistic framework. In the model, image category information directly affects the generation of a dictionary. A dictionary obtained by this approach is a trade-off between minimization of distortions of clusters and maximization of discriminative power of image-wise representations, i.e. histogram representations of images. We further extend the model to incorporate spatial information during the dictionary learning process in a spatial pyramid matching like manner. We extensively evaluated the two models on various benchmark dataset and obtained promising results.
TL;DR: Results show that the Recursive Coarse-to-Fine Localization (RCFL) achieves a 12x speed-up compared to standard sliding windows, and compared with a cascade of multiple resolutions approach the method has slightly better performance in speed and Average-Precision.
Abstract: Cascading techniques are commonly used to speed-up the scan of an image for object detection. However, cascades of detectors are slow to train due to the high number of detectors and corresponding thresholds to learn. Furthermore, they do not use any prior knowledge about the scene structure to decide where to focus the search. To handle these problems, we propose a new way to scan an image, where we couple a recursive coarse-to-fine refinement together with spatial constraints of the object location. For doing that we split an image into a set of uniformly distributed neighborhood regions, and for each of these we apply a local greedy search over feature resolutions. The neighborhood is defined as a scanning region that only one object can occupy. Therefore the best hypothesis is obtained as the location with maximum score and no thresholds are needed. We present an implementation of our method using a pyramid of HOG features and we evaluate it on two standard databases, VOC2007 and INRIA dataset. Results show that the Recursive Coarse-to-Fine Localization (RCFL) achieves a 12x speed-up compared to standard sliding windows. Compared with a cascade of multiple resolutions approach our method has slightly better performance in speed and Average-Precision. Furthermore, in contrast to cascading approach, the speed-up is independent of image conditions, the number of detected objects and clutter.
TL;DR: In this paper, the authors describe methods for classifying an input image by detecting one or more feature points on the input image; extracting one or multiple descriptors from each feature point; applying a codebook to quantize each descriptor and generate code from each descriptor; applying spatial pyramid matching to generate histograms; and concatenating histograms from all sub-regions to generate a final representation of the image for classification.
Abstract: Systems and methods are disclosed for classifying an input image by detecting one or more feature points on the input image; extracting one or more descriptors from each feature point; applying a codebook to quantize each descriptor and generate code from each descriptor; applying spatial pyramid matching to generate histograms; and concatenating histograms from all sub-regions to generate a final representation of the image for classification.
TL;DR: Experimental results show that the proposed method for fast text localization in natural scene images provides competitive localization performance at high speed.
Abstract: This paper proposes a new method for fast text localization in natural scene images by combining learning-based region filtering and verification in a coarse-to-fine strategy. In each pyramid layer, a boosted region filter is used to extract candidate text regions, which are segmented into candidate text lines by multi-orientation projection analysis. A polynomial classifier with combined features is used to verify patches of candidate text lines for removing non-texts. The remaining text patches over all pyramid layers are grouped into text lines based on their spatial relationships. The text lines are further refined and partitioned into words by connected component analysis. Experimental results show that the proposed method provides competitive localization performance at high speed.
TL;DR: In this paper, a method, system and computer program product for representing an image in the form of a Gaussian pyramid is provided. The image that needs to be represented is represented in the shape of a pyramid which is a scale space representation of the image and includes several pyramid images.
Abstract: A method, system and computer program product for representing an image is provided. The image that needs to be represented is represented in the form of a Gaussian pyramid which is a scale-space representation of the image and includes several pyramid images. The feature points in the pyramid images are identified and a specified number of feature points are selected. The orientations of the selected feature points are obtained by using a set of orientation calculating algorithms. A patch is extracted around the feature point in the pyramid images based on the orientations of the feature point and the sampling factor of the pyramid image. The boundary patches in the pyramid images are extracted by padding the pyramid images with extra pixels. The feature vectors of the extracted patches are defined. These feature vectors are normalized so that the components in the feature vectors are less than a threshold.
TL;DR: A scene classification method, which combines two popular methods in the literature: Spatial Pyramid Matching (SPM) and probabilistic Latent Semantic Analysis (pLSA) modeling, and it is seen that the proposed method slightly outperforms the others in that particular dataset.
Abstract: We propose a scene classification method, which combines two popular methods in the literature: Spatial Pyramid Matching (SPM) and probabilistic Latent Semantic Analysis (pLSA) modeling. The proposed scheme called Cascaded pLSA performs pLSA in a hierarchical sense after the soft-weighted BoW representation based on dense local features is extracted. We associate spatial layout information by dividing each image into overlapping regions iteratively at different resolution levels and implementing a pLSA model for each region individually. Finally, an image is represented by concatenated topic distributions of each region. In performance evaluation, we compare the proposed method with the most successful methods in the literature, using the popular 15-class-dataset. In the experiments, it is seen that our method slightly outperforms the others in that particular dataset.
TL;DR: A face recognition system with low-memory requirement and accurate recognition is presented, based on extraction of features with the DCT pyramid, in contrast to the conventional method of wavelet decomposition.
Abstract: Face recognition (FR) is a challenging issue due to variations in pose, illumination, and expression. In this paper, a face recognition system with low-memory requirement and accurate recognition is presented. It is based on extraction of features with the DCT pyramid, in contrast to the conventional method of wavelet decomposition. The DCT pyramid performed on each face image decomposes it into an approximation subband and the reversed L-shape blocks containing the high frequency coefficients of the DCT pyramid. A set of simple block-based statistical measures is calculated from the extracted DCT pyramid subbands. This set of statistical measures is an efficient way of reducing the dimensionality of the feature vectors. Experimental results on the standard ORL and FERET databases show that the proposed method achieves more accurate face recognition than the wavelet-based methods. Moreover, it outperforms the other well known methods such as PCA and the block-based DCT with the zigzag scanning in terms of accuracy and memory requirement.
TL;DR: A novel signal representation using fuzzy mathematical morphology is developed which provides results analogous to those given by the polynomial transform and is illustrated in data compression and fractal dimension estimation temporal signals and images.
Abstract: A novel signal representation using fuzzy mathematical morphology is developed. We take advantage of the optimum fuzzy fitting and the efficient implementation of morphological operators to extract geometric information from signals. The new representation provides results analogous to those given by the polynomial transform. Geometrical decomposition of a signal is achieved by windowing and applying sequentially fuzzy morphological opening with structuring functions. The resulting representation is made to resemble an orthogonal expansion by constraining the results of opening to equate adapted structuring functions. Properties of the geometric decomposition are considered and used to calculate the adaptation parameters. Our procedure provides an efficient and flexible representation which can be efficiently implemented in parallel. The application of the representation is illustrated in data compression and fractal dimension estimation temporal signals and images.
TL;DR: A new template matching method accelerated by an integral image is proposed that needs less memory than the conventional approach to maintain block sums of candidates and can be easily extended to nonsquare (rectangular) template matching.
Abstract: A new template matching method accelerated by an integral image is proposed. In contrast to the conventional winner-update template matching algorithm, the proposed scheme uses an integral image instead of a block sum pyramid to represent the search area. When an integral image is used, block sums on the lowest level are evaluated very fast. As a result, the speed with which nonbest candidates are rejected is nearly double that of the conventional scheme. Moreover, the proposed scheme needs less memory than the conventional approach to maintain block sums of candidates and can be easily extended to nonsquare (rectangular) template matching.
TL;DR: The concept of the direction image multiresolution is discussed, which is derived as a property of the 2-D discrete Fourier transform, when it splits by 1-D transforms, and the resolution map is introduced, as a result of uniting all direction images into log2 N series.
Abstract: We discuss the concept of the direction image multiresolution, which is derived as a property of the 2-D discrete Fourier transform, when it splits by 1-D transforms. The N×N image, where N is a power of 2, is considered as a unique set of splitting-signals in paired representation, which is the unitary 2-D frequency and 1-D time representation. The number of splitting-signals is 3N−2, and they have different durations, carry the spectral information of the image in disjoint subsets of frequency points, and can be calculated from the projection data along one of 3N/2 angles. The paired representation leads to the image composition by a set of 3N−2 direction images, which defines the directed multiresolution and contains periodic components of the image. We also introduce the concept of the resolution map, as a result of uniting all direction images into log2 N series. In the resolution map, all different periodic components (or structures) of the image are packed into a N×N matrix, which can be used for image processing in enhancement, filtration, and compression
TL;DR: A new approach to image retrieval based on color, texture, and shape by using pyramid structure wavelet is presented and the receiving operating characteristic curve (ROC) is generated to assess the results.
Abstract: As technology continues to increase the various formats in which medical images are created, transmitted, and analyzed, it has become more necessary to restrict the different ways in which this data is stored and formatted between the conflicting modalities. There is a significant increase in the use of medical images in clinical medicine, disease research, and education. While the literature lists several successful systems for contentbased image retrieval and image management methods, they have been unable to make significant inroads in routine medical informatics. This paper presents a new approach to image retrieval based on color, texture, and shape by using pyramid structure wavelet. The major advantage of such an approach is that little human intervention is required. However, most of these systems only allow a user to query using a complete image with multiple regions and are unable to retrieve similar looking images based on a single region. Experimental results of the query system on different test image databases are given. This paper introduces a comparative study between color, texture, shape and the pyramid structure wavelet technique and generates the receiving operating characteristic curve (ROC) to assess the results. The area under the curve when use color is 0.58, when use shape is 0.68, when use texture 0.74 and when use the wavelet technique is 0.8.
TL;DR: A partial Hausdorff distance measurement based on image contour matching method is proposed, which is fit for the inshore ship search and location.
Abstract: Inshore ship detection has significant practical meaning,especially for the target change detection.However,it is difficult to realize the inshore ship detection utilizing the traditional area-based method because of the complex background.A partial Hausdorff distance measurement based on image contour matching method is proposed,which is fit for the inshore ship search and location.The main characteristics of the proposed method are,1) a fast distance transform and pyramid decomposition are used to speedup the Hausdorff distance matching;2) a pyramid is constructed from the original image to avoid the over-sample of contour.Experiments with images of satellite are carried out to validate and analyze the proposed method.
TL;DR: A thorough experimental evaluation of the two methods for solving the correspondence problem via the definition of a kernel function that makes it possible to use local features as input to a support vector machine shows that the exact method performs consistently better than the approximate one.
Abstract: Local features have repeatedly shown their effectiveness for object recognition during the last years, and they have consequently become the preferred descriptor for this type of problems. The solution of the correspondence problem is traditionally approached with exact or approximate techniques. In this paper we are interested in methods that solve the correspondence problem via the definition of a kernel function that makes it possible to use local features as input to a support vector machine. We single out the match kernel, an exact approach, and the pyramid match kernel, that uses instead an approximate strategy. We present a thorough experimental evaluation of the two methods on three different databases. Results show that the exact method performs consistently better than the approximate one, especially for the object identification task, when training on a decreasing number of images. Based on this findings and on the computational cost of each approach, we suggest some criteria for choosing between the two kernels given the application at hand.
TL;DR: This work presents a scheme that creates a visually smooth mipmap pyramid from stitched imagery at several scales by using a nonlinear operator to inject detail from the fine image into the coarse one while retaining color consistency.
Abstract: Multiscale imagery often combines several sources with differing appearance. For instance, Internet-based maps contain satellite and aerial photography. Zooming within these maps may reveal jarring transitions. We present a scheme that creates a visually smooth mipmap pyramid from stitched imagery at several scales. The scheme involves two new techniques. The first, structure transfer, is a nonlinear operator that combines the detail of one image with the local appearance of another. We use this operator to inject detail from the fine image into the coarse one while retaining color consistency. The improved structural similarity greatly reduces inter-level ghosting artifacts. The second, clipped Laplacian blending, is an efficient construction to minimize blur when creating intermediate levels. It considers the sum of all inter-level image differences within the pyramid. We demonstrate continuous zooming of map imagery from space to ground level.
TL;DR: The results show that precision and recall of Haar Wavelets are better than complete Haar transform based CBIR, which proves that HaarWavelets gives better discrimination capability in image retrieval at higher query execution speed, per higher level Haar wavelets.
Abstract: The paper presents the Wavelet Pyramid based image retrieval techniques [1] using Haar transform. Here content based image retrieval (CBIR) is done using the image feature set extracted from Haar Wavelets applied on the image at various levels of decomposition. Here the database image features are extracted by applying Haar Wavelets on gray plane (average of red, green and blue) and color planes (red, green and blue components). The techniques Gray-Haar Wavelets and Color-Haar Wavelets are tested on image database having 11 categories with total 1000 images. Total 55 queries are fired on the database. The results show that precision and recall of Haar Wavelets are better than complete Haar transform based CBIR, which proves that Haar Wavelets gives better discrimination capability in image retrieval at higher query execution speed, per higher level Haar Wavelets. Color-Haar Wavelets based CBIR have greater precision and recall than Gray-Haar Wavelets based CBIR. The Haar Wavelets level-5 outperforms other Haar Wavelets, because the higher level Haar Wavelets are giving very coarse color-texture features while the lower level are representing very fine color-texture features which are less useful to differentiate the images in image retrieval.
TL;DR: This paper proposes a new method to exploit spatial relationships between image features, based on binned log-polar grids, and shows that this approach improves the results on three diverse datasets over the SPM technique.
Abstract: This paper presents a new model for capturing spatial information for object categorization with bag-of-words (BOW). BOW models have recently become popular for the task of object recognition, owing to their good performance and simplicity. Much work has been proposed over the years to improve the BOW model, where the Spatial Pyramid Matching (SPM) technique is the most notable. We propose a new method to exploit spatial relationships between image features, based on binned log-polar grids. Our model works by partitioning the image into grids of different scales and orientations and computing histogram of local features within each grid. Experimental results show that our approach improves the results on three diverse datasets over the SPM technique.
TL;DR: The proposed unsupervised method to address videoobject extraction (VOE) in uncontrolled videos, i.e. videos captured by low-resolution and freely moving cameras, advocates the use of dense optical-flow trajectories (DOTs), which are obtained by propagating the optical flow information at the pixel level.
Abstract: We proposes an unsupervised method to address videoobject extraction (VOE) in uncontrolled videos, i.e. videoscaptured by low-resolution and freely moving cameras. Weadvocate the use of dense optical-flow trajectories (DOTs),which are obtained by propagating the optical flow informationat the pixel level. Therefore, no interest point extractionis required in our framework. To integrate colorand and shape information of moving objects, we groupthe DOTs at the super-pixel level to extract co-motion regions,and use the associated pyramid histogram of orientedgradients (PHOG) descriptors to extract objects of interestacross video frames. Our approach for VOE is easy to implement,and the use of DOTs for both motion segmentationand object tracking is more robust than existing trajectorybasedmethods. Experiments on several video sequencesexhibit the feasibility of our proposed VOE framework.
TL;DR: This paper introduces several novel bag of visual keywords methods and compares them with the currently dominating hard bag-of-features (HBOF) approach that uses a hard assignment scheme to compute cluster frequencies.
Abstract: Object recognition systems need effective image descriptors to obtain good performance levels. Currently, the most widely used image descriptor is the SIFT descriptor that computes histograms of orientation gradients around points in an image. A possible problem of this approach is that the number of features becomes very large when a dense grid is used where the histograms are computed and combined for many different points. The current dominating solution to this problem is to use a clustering method to create a visual codebook that is exploited by an appearance based descriptor to create a histogram of visual keywords present in an image. In this paper we introduce several novel bag of visual keywords methods and compare them with the currently dominating hard bag-of-features (HBOF) approach that uses a hard assignment scheme to compute cluster frequencies. Furthermore, we combine all descriptors with a spatial pyramid and two ensemble classifiers. Experimental results on 10 and 101 classes of the Caltech-101 object database show that our novel methods significantly outperform the traditional HBOF approach and that our ensemble methods obtain state-of-the-art performance levels.
TL;DR: A parallax representation unit in a displayed image processing unit uses a height map containing information on a height of an object for each pixel to represent different views caused by the height of the object.
Abstract: A parallax representation unit in a displayed image processing unit uses a height map containing information on a height of an object for each pixel to represent different views caused by the height of the object. A color representation unit uses, for example, texture coordinate values derived by the parallax representation unit to render the image, shifting the pixel defined in the color map. The color representation unit uses the normal map that maintains normals to the surface of the object for each pixel to change the way that light impinges on the surface and represent the roughness accordingly. A shadow representation unit uses a horizon map, which maintains information for each pixel to indicate whether a shadow is cast depending on the angle relative to the light source, so as to shadow the image rendered by the color representation unit.