Visual dictionary

Topic Tools

Papers published on a yearly basis

Papers

Proceedings Article•

Visual categorization with bags of keypoints

[...]

1 Jan 2004

TL;DR: This bag of keypoints method is based on vector quantization of affine invariant descriptors of image patches and shows that it is simple, computationally efficient and intrinsically invariant.

...read moreread less

Abstract: We present a novel method for generic visual categorization: the problem of identifying the object content of natural images while generalizing across variations inherent to the object class. This bag of keypoints method is based on vector quantization of affine invariant descriptors of image patches. We propose and compare two alternative implementations using different classifiers: Naive Bayes and SVM. The main advantages of the method are that it is simple, computationally efficient and intrinsically invariant. We present results for simultaneously classifying seven semantic visual categories. These results clearly demonstrate that the method is robust to background clutter and produces good categorization accuracy even without exploiting geometric information.

...read moreread less

5,369 citations

Proceedings Article•10.1109/CVPR46437.2021.01278•

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

[...]

Zhicheng Huang, Zhaoyang Zeng¹, Yupan Huang¹, Bei Liu², Dongmei Fu, Jianlong Fu² - Show less +2 more•Institutions (2)

Sun Yat-sen University¹, Microsoft²

19 Jun 2021

TL;DR: SOHO as discussed by the authors learns to extract comprehensive yet compact image features through a visual dictionary (VD) that facilitates cross-modal understanding by taking a whole image as input, and learns vision-language representation in an end-to-end manner.

...read moreread less

Abstract: We study joint learning of Convolutional Neural Network (CNN) and Transformer for vision-language pre-training (VLPT) which aims to learn cross-modal alignments from millions of image-text pairs. State-of-the-art approaches extract salient image regions and align regions with words step-by-step. As region-based visual features usually represent parts of an image, it is challenging for existing vision-language models to fully understand the semantics from paired natural languages. In this paper, we propose SOHO to "Seeing Out of tHe bOx" that takes a whole image as input, and learns vision-language representation in an end-to-end manner. SOHO does not require bounding box annotations which enables inference 10 times faster than region-based approaches. In particular, SOHO learns to extract comprehensive yet compact image features through a visual dictionary (VD) that facilitates cross-modal understanding. VD is designed to represent consistent visual abstractions of similar semantics. It is updated on-the-fly and utilized in our proposed pre-training task Masked Visual Modeling (MVM). We conduct experiments on four well-established vision-language tasks by following standard VLPT settings. In particular, SOHO achieves absolute gains of 2.0% R@1 score on MSCOCO text retrieval 5k test split, 1.5% accuracy on NLVR2 test-P split, 6.7% accuracy on SNLI-VE test split, respectively.

...read moreread less

319 citations

Journal Article•10.1109/TGRS.2015.2435801•

Scene Classification Based on the Multifeature Fusion Probabilistic Topic Model for High Spatial Resolution Remote Sensing Imagery

[...]

Yanfei Zhong¹, Qiqi Zhu¹, Liangpei Zhang¹•Institutions (1)

Wuhan University¹

08 Jun 2015-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: A semantic allocation level (SAL) multifeature fusion strategy based on PTM, namely, SAL-PTM (S AL-pLSA and SAL-LDA) for HSR imagery is proposed, and the experimental results confirmed that SAL- PTM is superior to the single-feature methods and CAT-PTm in the scene classification of H SR imagery.

...read moreread less

Abstract: Scene classification has been proved to be an effective method for high spatial resolution (HSR) remote sensing image semantic interpretation. The probabilistic topic model (PTM) has been successfully applied to natural scenes by utilizing a single feature (e.g., the spectral feature); however, it is inadequate for HSR images due to the complex structure of the land-cover classes. Although several studies have investigated techniques that combine multiple features, the different features are usually quantized after simple concatenation (CAT-PTM). Unfortunately, due to the inadequate fusion capacity of $\boldsymbol{k}$ -means clustering, the words of the visual dictionary obtained by CAT-PTM are highly correlated. In this paper, a semantic allocation level (SAL) multifeature fusion strategy based on PTM, namely, SAL-PTM (SAL-pLSA and SAL-LDA) for HSR imagery is proposed. In SAL-PTM: 1) the complementary spectral, texture, and scale-invariant-feature-transform features are effectively combined; 2) the three features are extracted and quantized separately by $\boldsymbol{k}$ -means clustering, which can provide appropriate low-level feature descriptions for the semantic representations; and 3)the latent semantic allocations of the three features are captured separately by PTM, which follows the core idea of PTM-based scene classification. The probabilistic latent semantic analysis (pLSA) and latent Dirichlet allocation (LDA) models were compared to test the effect of different PTMs for HSR imagery. A U.S. Geological Survey data set and the UC Merced data set were utilized to evaluate SAL-PTM in comparison with the conventional methods. The experimental results confirmed that SAL-PTM is superior to the single-feature methods and CAT-PTM in the scene classification of HSR imagery.

...read moreread less

257 citations

Journal Article•10.1109/JBHI.2014.2308928•

A food recognition system for diabetic patients based on an optimized bag-of-features model

[...]

Marios Anthimopoulos¹, Lauro Gianola¹, Luca Scarnato¹, Peter Diem¹, Stavroula Mougiakakou¹ - Show less +1 more•Institutions (1)

University of Bern¹

11 Mar 2014-IEEE Journal of Biomedical and Health Informatics

TL;DR: The proposed methodology for automatic food recognition, based on the bag-of-features (BoF) model, achieved classification accuracy of the order of 78%, thus proving the feasibility of the proposed approach in a very challenging image dataset.

...read moreread less

Abstract: Computer vision-based food recognition could be used to estimate a meal's carbohydrate content for diabetic patients. This study proposes a methodology for automatic food recognition, based on the bag-of-features (BoF) model. An extensive technical investigation was conducted for the identification and optimization of the best performing components involved in the BoF architecture, as well as the estimation of the corresponding parameters. For the design and evaluation of the prototype system, a visual dataset with nearly 5000 food images was created and organized into 11 classes. The optimized system computes dense local features, using the scale-invariant feature transform on the HSV color space, builds a visual dictionary of 10000 visual words by using the hierarchical k-means clustering and finally classifies the food images with a linear support vector machine classifier. The system achieved classification accuracy of the order of 78%, thus proving the feasibility of the proposed approach in a very challenging image dataset.

...read moreread less

250 citations

Journal Article•10.1016/J.CVIU.2012.09.007•

Pooling in image representation: The visual codeword point of view

[...]

Sandra Avila¹, Nicolas Thome², Matthieu Cord², Eduardo Valle³, Arnaldo de Albuquerque Araújo¹ - Show less +1 more•Institutions (3)

Universidade Federal de Minas Gerais¹, Pierre-and-Marie-Curie University², State University of Campinas³

01 May 2013-Computer Vision and Image Understanding

TL;DR: B BossaNova is proposed, a novel representation for content-based concept detection in images and videos, which enriches the Bag-of-Words model, and is compact and simple to compute.

...read moreread less

237 citations

...

Expand

Year	Papers
2021	8
2020	12
2019	24
2018	25
2017	25
2016	39

Topic Tools

Papers published on a yearly basis

Papers

Visual categorization with bags of keypoints

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Scene Classification Based on the Multifeature Fusion Probabilistic Topic Model for High Spatial Resolution Remote Sensing Imagery

A food recognition system for diabetic patients based on an optimized bag-of-features model

Pooling in image representation: The visual codeword point of view

Related Topics (5)

Performance Metrics