Book Chapter10.1007/11744078_25
Learning compositional categorization models
Björn Ommer,Joachim M. Buhmann +1 more
- 07 May 2006
- pp 316-329
TL;DR: In this paper, a compositional approach to visual object categorization of scenes is proposed, in which a bag of parts with a locality constraint is formed by a shape model and coupled probabilistic kernel classifiers are applied to establish the final image categorization.
read more
Abstract: This contribution proposes a compositional approach to visual object categorization of scenes. Compositions are learned from the Caltech 101 database and form intermediate abstractions of images that are semantically situated between low-level representations and the high-level categorization. Salient regions, which are described by localized feature histograms, are detected as image parts. Subsequently compositions are formed as bags of parts with a locality constraint. After performing a spatial binding of compositions by means of a shape model, coupled probabilistic kernel classifiers are applied thereupon to establish the final image categorization. In contrast to the discriminative training of the categorizer, intermediate compositions are learned in a generative manner yielding relevant part agglomerations, i.e. groupings which are frequently appearing in the dataset while simultaneously supporting the discrimination between sets of categories. Consequently, compositionality simplifies the learning of a complex categorization model for complete scenes by splitting it up into simpler, sharable compositions. The architecture is evaluated on the highly challenging Caltech 101 database which exhibits large intra-category variations. Our compositional approach shows competitive retrieval rates in the range of 53.6 ± 0.88% or, with a multi-scale feature set, rates of 57.8 ± 0.79%.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition
Hao Zhang,Alexander C. Berg,Michael Maire,Jitendra Malik +3 more
- 17 Jun 2006
TL;DR: This work considers visual category recognition in the framework of measuring similarities, or equivalently perceptual distances, to prototype examples of categories and proposes a hybrid of these two methods which deals naturally with the multiclass setting, has reasonable computational complexity both in training and at run time, and yields excellent results in practice.
1.3K
Proximity Distribution Kernels for Geometric Context in Category Recognition
Haibin Ling,Stefano Soatto +1 more
- 26 Dec 2007
TL;DR: A novel "proximity distribution kernel" that naturally combines local geometric as well as photometric information from images that satisfies Mercer's condition and can be readily combined with a support vector machine to perform visual categorization in a way that is insensitive to photometric and geometric variations, while retaining significant discriminative power.
Learning an Alphabet of Shape and Appearance for Multi-Class Object Detection
Andreas Opelt,Axel Pinz,Andrew Zisserman +2 more
- 01 Oct 2008
TL;DR: A novel algorithmic approach to object categorization and detection that can learn category specific detectors, using Boosting, from a visual alphabet of shape and appearance, and shows that incremental learning of a BFM for many categories leads to a sub-linear growth of visual alphabet entries by sharing of shape features.
Learning the Compositional Nature of Visual Objects
Björn Ommer,Joachim M. Buhmann +1 more
- 17 Jun 2007
TL;DR: Adopting this modeling strategy automatically decompose objects into a hierarchy of relevant compositions and they learn such a compositional representation for each category without supervision to obtain a category level object recognition system.
63
Compositional Boosting for Computing Hierarchical Image Structures
Tianfu Wu,Gui-Song Xia,Song-Chun Zhu +2 more
- 17 Jun 2007
TL;DR: A compositional boosting algorithm for detecting and recognizing 17 common image structures in low-middle level vision tasks and applying this algorithm to a wide range of indoor and outdoor images with satisfactory results.
References
Distinctive Image Features from Scale-Invariant Keypoints
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Rapid object detection using a boosted cascade of simple features
Paul A. Viola,Michael Jones +1 more
- 01 Dec 2001
TL;DR: A machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates and the introduction of a new image representation called the "integral image" which allows the features used by the detector to be computed very quickly.
Recognition-by-Components: A Theory of Human Image Understanding.
TL;DR: Recognition-by-components (RBC) provides a principled account of the heretofore undecided relation between the classic principles of perceptual organization and pattern recognition.
•Proceedings Article
Visual categorization with bags of keypoints
Gabriela Csurka
- 01 Jan 2004
TL;DR: This bag of keypoints method is based on vector quantization of affine invariant descriptors of image patches and shows that it is simple, computationally efficient and intrinsically invariant.
Scale & Affine Invariant Interest Point Detectors
TL;DR: A comparative evaluation of different detectors is presented and it is shown that the proposed approach for detecting interest points invariant to scale and affine transformations provides better results than existing methods.