Book Chapter10.1007/978-3-319-10602-1_38
Action Recognition with Stacked Fisher Vectors
Xiaojiang Peng,Xiaojiang Peng,Changqing Zou,Yu Qiao,Qiang Peng +4 more
- 06 Sep 2014
- pp 581-595
TL;DR: Experimental results demonstrate the effectiveness of SFV, and the combination of the traditional FV and SFV outperforms state-of-the-art methods on these datasets with a large margin.
read more
Abstract: Representation of video is a vital problem in action recognition. This paper proposes Stacked Fisher Vectors (SFV), a new representation with multi-layer nested Fisher vector encoding, for action recognition. In the first layer, we densely sample large subvolumes from input videos, extract local features, and encode them using Fisher vectors (FVs). The second layer compresses the FVs of subvolumes obtained in previous layer, and then encodes them again with Fisher vectors. Compared with standard FV, SFV allows refining the representation and abstracting semantic information in a hierarchical way. Compared with recent mid-level based action representations, SFV need not to mine discriminative action parts but can preserve mid-level information through Fisher vector encoding in higher layer. We evaluate the proposed methods on three challenging datasets, namely Youtube, J-HMDB, and HMDB51. Experimental results demonstrate the effectiveness of SFV, and the combination of the traditional FV and SFV outperforms state-of-the-art methods on these datasets with a large margin.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Beyond Gaussian Pyramid: Multi-skip Feature Stacking for action recognition
Zhenzhong Lan,Ming Lin,Xuanchong Li,Alexander G. Hauptmann,Bhiksha Raj +4 more
- 07 Jun 2015
TL;DR: In this article, a multi-skIp feature stacking (MIFS) method is proposed to stack features extracted using a family of differential filters parameterized with multiple time skips and encodes shift-invariance into the frequency space.
Human Action Recognition from Various Data Modalities: A Review
TL;DR: This paper reviews both the hand-crafted feature-based and deep learning-based methods for single data modalities and also the methods based on multiple modalities, including the fusion-based frameworks and the co-learning-based approaches for HAR.
A Review on Human Activity Recognition Using Vision-Based Method
TL;DR: This review highlights the advances of state-of-the-art activity recognition approaches, especially for the activity representation and classification methods, and classify existing literatures with a detailed taxonomy including representation and Classification methods, as well as the datasets they used.
•Posted Content
Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks
TL;DR: A compact, effective yet simple method to encode spatio-temporal information carried in 3D skeleton sequences into multiple 2D images, referred to as Joint Trajectory Maps (JTM), and ConvNets are adopted to exploit the discriminative features for real-time human action recognition.
316
•Proceedings Article
Actions ~ Transformations
Xiaolong Wang,Ali Farhadi,Abhinav Gupta +2 more
- 01 Jun 2016
TL;DR: A novel representation for actions is proposed by modeling an action as a transformation which changes the state of the environment before the action happens (precondition) to the state after the action (effect).
307
References
ImageNet classification with deep convolutional neural networks
TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
•Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton +2 more
- 03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
LIBSVM: A library for support vector machines
Chih-Chung Chang,Chih-Jen Lin +1 more
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
•Book
Pattern Recognition and Machine Learning
Christopher M. Bishop
- 17 Aug 2006
TL;DR: Probability Distributions, linear models for Regression, Linear Models for Classification, Neural Networks, Graphical Models, Mixture Models and EM, Sampling Methods, Continuous Latent Variables, Sequential Data are studied.
Pattern Recognition and Machine Learning
Christopher M. Bishop
- 01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
10.1K