Encoding High Dimensional Local Features by Sparse Coding Based Fisher Vectors

Open AccessProceedings Article

Encoding High Dimensional Local Features by Sparse Coding Based Fisher Vectors

- 08 Dec 2014

- Vol. 27, pp 1143-1151

82

TL;DR: A model in which each local feature is drawn from a Gaussian distribution whose mean vector is sampled from a subspace, termed Sparse Coding based Fisher Vector Coding (SCFVC), which significantly outperforms the traditional GMM based Fisher vector encoding and achieves the state-of-the-art performance in generic object recognition, indoor scene, and fine-grained image classification problems.

Abstract: Deriving from the gradient vector of a generative model of local features, Fisher vector coding (FVC) has been identified as an effective coding method for image classification. Most, if not all, FVC implementations employ the Gaussian mixture model (GMM) to characterize the generation process of local features. This choice has shown to be sufficient for traditional low dimensional local features, e.g., SIFT; and typically, good performance can be achieved with only a few hundred Gaussian distributions. However, the same number of Gaussians is insufficient to model the feature space spanned by higher dimensional local features, which have become popular recently. In order to improve the modeling capacity for high dimensional features, it turns out to be inefficient and computationally impractical to simply increase the number of Gaussians. In this paper, we propose a model in which each local feature is drawn from a Gaussian distribution whose mean vector is sampled from a subspace. With certain approximation, this model can be converted to a sparse coding procedure and the learning/inference problems can be readily solved by standard sparse coding methods. By calculating the gradient vector of the proposed model, we derive a new fisher vector encoding strategy, termed Sparse Coding based Fisher Vector Coding (SCFVC). Moreover, we adopt the recently developed Deep Convolutional Neural Network (CNN) descriptor as a high dimensional local feature and implement image classification with the proposed SCFVC. Our experimental evaluations demonstrate that our method not only significantly outperforms the traditional GMM based Fisher vector encoding but also achieves the state-of-the-art performance in generic object recognition, indoor scene, and fine-grained image classification problems.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1109/TBME.2018.2866166

Melanoma Recognition in Dermoscopy Images via Aggregated Deep Convolutional Features

Zhen Yu, +7 more

- 01 Apr 2019

- IEEE Transactions on Biomedical Engineer...

TL;DR: This paper presents a novel framework for dermoscopy image recognition via both a deep learning method and a local descriptor encoding strategy that is capable of generating more discriminative features to deal with large variations within melanoma classes, as well as small variations between melanoma and nonmelanoma classes with limited training data.

...read moreread less

260

Proceedings Article•10.1109/CVPR.2015.7298998

Fisher vectors meet Neural Networks: A hybrid classification architecture

Florent Perronnin, +1 more

- 20 Apr 2015

TL;DR: A hybrid architecture that combines their strengths: the first unsupervised layers rely on the FV while the subsequent fully-connected supervised layers are trained with back-propagation, which significantly outperforms standard FV systems without incurring the high cost that comes with CNNs.

...read moreread less

260

Journal Article•10.1109/TITS.2017.2679114

Deep CNNs With Spatially Weighted Pooling for Fine-Grained Car Recognition

Qichang Hu, +3 more

- 04 Apr 2017

- IEEE Transactions on Intelligent Transpo...

TL;DR: A spatially weighted pooling (SWP) strategy is proposed, which considerably improves the robustness and effectiveness of the feature representation of most dominant DCNNs and can achieve better performance than recent approaches in the literature.

...read moreread less

125

•Proceedings Article•10.1109/CVPR.2015.7298699

Mid-level deep pattern mining

Yao Li, +3 more

- 07 Jun 2015

TL;DR: In this paper, the first fully-connected layer of a CNN has two appealing properties which enable its seamless integration with pattern mining, and patterns are then discovered from a large number of CNN activations of image patches through the well-known association rule mining.

...read moreread less

111

•Journal Article•10.1109/TPAMI.2016.2637921

Cross-Convolutional-Layer Pooling for Image Recognition

Lingqiao Liu, +2 more

- 01 Nov 2017

- IEEE Transactions on Pattern Analysis an...

TL;DR: Zhang et al. as mentioned in this paper proposed a novel way to extract image representations from two consecutive convolutional layers: one layer is used for local feature extraction and the other serves as guidance to pool the extracted features.

...read moreread less

97

...

Expand

References

•Journal Article•10.1016/J.CVIU.2007.09.014

Speeded-Up Robust Features (SURF)

Herbert Bay, +3 more

- 01 Jun 2008

- Computer Vision and Image Understanding

TL;DR: A novel scale- and rotation-invariant detector and descriptor, coined SURF (Speeded-Up Robust Features), which approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster.

...read moreread less

14.9K

•Proceedings Article

DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition

Jeff Donahue, +6 more

- 21 Jun 2014

TL;DR: DeCAF as discussed by the authors is an open-source implementation of these deep convolutional activation features, along with all associated network parameters, to enable vision researchers to conduct experimentation with deep representations across a range of visual concept learning paradigms.

...read moreread less

4.7K

•Posted Content

CNN Features off-the-shelf: an Astounding Baseline for Recognition

Ali Sharif Razavian, +3 more

- 23 Mar 2014

- arXiv: Computer Vision and Pattern Recog...

TL;DR: A series of experiments conducted for different recognition tasks using the publicly available code and model of the OverFeat network which was trained to perform object classification on ILSVRC13 suggest that features obtained from deep learning with convolutional nets should be the primary candidate in most visual recognition tasks.

...read moreread less

4.5K

Proceedings Article•10.1109/CVPR.2010.5540018

Locality-constrained Linear Coding for image classification

Jinjun Wang, +5 more

- 13 Jun 2010

TL;DR: This paper presents a simple but effective coding scheme called Locality-constrained Linear Coding (LLC) in place of the VQ coding in traditional SPM, using the locality constraints to project each descriptor into its local-coordinate system, and the projected coordinates are integrated by max pooling to generate the final representation.

...read moreread less

3.7K

•Book Chapter•10.1007/978-3-642-15561-1_11

Improving the fisher kernel for large-scale image classification

Florent Perronnin, +2 more

- 05 Sep 2010

TL;DR: In an evaluation involving hundreds of thousands of training images, it is shown that classifiers learned on Flickr groups perform surprisingly well and that they can complement classifier learned on more carefully annotated datasets.

...read moreread less

3.2K