Open AccessProceedings Article
Encoding High Dimensional Local Features by Sparse Coding Based Fisher Vectors
Lingqiao Liu,Chunhua Shen,Lei Wang,Anton van den Hengel,Chao Wang +4 more
- 08 Dec 2014
- Vol. 27, pp 1143-1151
TL;DR: A model in which each local feature is drawn from a Gaussian distribution whose mean vector is sampled from a subspace, termed Sparse Coding based Fisher Vector Coding (SCFVC), which significantly outperforms the traditional GMM based Fisher vector encoding and achieves the state-of-the-art performance in generic object recognition, indoor scene, and fine-grained image classification problems.
read more
Abstract: Deriving from the gradient vector of a generative model of local features, Fisher vector coding (FVC) has been identified as an effective coding method for image classification. Most, if not all, FVC implementations employ the Gaussian mixture model (GMM) to characterize the generation process of local features. This choice has shown to be sufficient for traditional low dimensional local features, e.g., SIFT; and typically, good performance can be achieved with only a few hundred Gaussian distributions. However, the same number of Gaussians is insufficient to model the feature space spanned by higher dimensional local features, which have become popular recently. In order to improve the modeling capacity for high dimensional features, it turns out to be inefficient and computationally impractical to simply increase the number of Gaussians.
In this paper, we propose a model in which each local feature is drawn from a Gaussian distribution whose mean vector is sampled from a subspace. With certain approximation, this model can be converted to a sparse coding procedure and the learning/inference problems can be readily solved by standard sparse coding methods. By calculating the gradient vector of the proposed model, we derive a new fisher vector encoding strategy, termed Sparse Coding based Fisher Vector Coding (SCFVC). Moreover, we adopt the recently developed Deep Convolutional Neural Network (CNN) descriptor as a high dimensional local feature and implement image classification with the proposed SCFVC. Our experimental evaluations demonstrate that our method not only significantly outperforms the traditional GMM based Fisher vector encoding but also achieves the state-of-the-art performance in generic object recognition, indoor scene, and fine-grained image classification problems.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Melanoma Recognition in Dermoscopy Images via Aggregated Deep Convolutional Features
TL;DR: This paper presents a novel framework for dermoscopy image recognition via both a deep learning method and a local descriptor encoding strategy that is capable of generating more discriminative features to deal with large variations within melanoma classes, as well as small variations between melanoma and nonmelanoma classes with limited training data.
260
Fisher vectors meet Neural Networks: A hybrid classification architecture
Florent Perronnin,Diane Larlus +1 more
- 20 Apr 2015
TL;DR: A hybrid architecture that combines their strengths: the first unsupervised layers rely on the FV while the subsequent fully-connected supervised layers are trained with back-propagation, which significantly outperforms standard FV systems without incurring the high cost that comes with CNNs.
Deep CNNs With Spatially Weighted Pooling for Fine-Grained Car Recognition
TL;DR: A spatially weighted pooling (SWP) strategy is proposed, which considerably improves the robustness and effectiveness of the feature representation of most dominant DCNNs and can achieve better performance than recent approaches in the literature.
125
Mid-level deep pattern mining
Yao Li,Lingqiao Liu,Chunhua Shen,Anton van den Hengel +3 more
- 07 Jun 2015
TL;DR: In this paper, the first fully-connected layer of a CNN has two appealing properties which enable its seamless integration with pattern mining, and patterns are then discovered from a large number of CNN activations of image patches through the well-known association rule mining.
Cross-Convolutional-Layer Pooling for Image Recognition
TL;DR: Zhang et al. as mentioned in this paper proposed a novel way to extract image representations from two consecutive convolutional layers: one layer is used for local feature extraction and the other serves as guidance to pool the extracted features.
References
Speeded-Up Robust Features (SURF)
TL;DR: A novel scale- and rotation-invariant detector and descriptor, coined SURF (Speeded-Up Robust Features), which approximates or even outperforms previously proposed schemes with respect to repeatability, distinctiveness, and robustness, yet can be computed and compared much faster.
14.9K
•Proceedings Article
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
Jeff Donahue,Yangqing Jia,Oriol Vinyals,Judy Hoffman,Ning Zhang,Eric Tzeng,Trevor Darrell +6 more
- 21 Jun 2014
TL;DR: DeCAF as discussed by the authors is an open-source implementation of these deep convolutional activation features, along with all associated network parameters, to enable vision researchers to conduct experimentation with deep representations across a range of visual concept learning paradigms.
•Posted Content
CNN Features off-the-shelf: an Astounding Baseline for Recognition
TL;DR: A series of experiments conducted for different recognition tasks using the publicly available code and model of the OverFeat network which was trained to perform object classification on ILSVRC13 suggest that features obtained from deep learning with convolutional nets should be the primary candidate in most visual recognition tasks.
4.5K
Locality-constrained Linear Coding for image classification
Jinjun Wang,Jianchao Yang,Kai Yu,Fengjun Lv,Thomas S. Huang,Yihong Gong +5 more
- 13 Jun 2010
TL;DR: This paper presents a simple but effective coding scheme called Locality-constrained Linear Coding (LLC) in place of the VQ coding in traditional SPM, using the locality constraints to project each descriptor into its local-coordinate system, and the projected coordinates are integrated by max pooling to generate the final representation.
3.7K
Improving the fisher kernel for large-scale image classification
Florent Perronnin,Jorge Sanchez,Thomas Mensink +2 more
- 05 Sep 2010
TL;DR: In an evaluation involving hundreds of thousands of training images, it is shown that classifiers learned on Flickr groups perform surprisingly well and that they can complement classifier learned on more carefully annotated datasets.