Action Recognition Based on Joint Trajectory Maps Using Convolutional Neural Networks
Pichao Wang,Zhaoyang Li,Yonghong Hou,Wanqing Li +3 more
- 01 Oct 2016
- Vol. 158, pp 102-106
TL;DR: In this article, a joint trajectory map (JTM) was proposed to encode spatio-temporal information carried in 3D skeleton sequences into multiple 2D images, referred to as Joint Trajectory Maps (jTM), and ConvNets were adopted to exploit the discriminative features for real-time human action recognition.
read more
Abstract: Recently, Convolutional Neural Networks (ConvNets) have shown promising performances in many computer vision tasks, especially image-based recognition. How to effectively use ConvNets for video-based recognition is still an open problem. In this paper, we propose a compact, effective yet simple method to encode spatio-temporal information carried in 3D skeleton sequences into multiple 2D images, referred to as Joint Trajectory Maps (JTM), and ConvNets are adopted to exploit the discriminative features for real-time human action recognition. The proposed method has been evaluated on three public benchmarks, i.e., MSRC-12 Kinect gesture dataset (MSRC-12), G3D dataset and UTD multimodal human action dataset (UTD-MHAD) and achieved the state-of-the-art results.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Enhanced skeleton visualization for view invariant human action recognition
Mengyuan Liu,Hong Liu,Chen Chen +2 more
TL;DR: Enhanced skeleton visualization method encodes spatio-temporal skeletons as visual and motion enhanced color images in a compact yet distinctive manner and consistently achieves the highest accuracies on four datasets, including the largest and most challenging NTU RGB+D dataset for skeleton-based action recognition.
947
Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN
Shuai Li,Wanqing Li,Christopher David Cook,Ce Zhu,Yanbo Gao +4 more
- 13 Mar 2018
TL;DR: Independently Recurrent Neural Network (IndRNN) as discussed by the authors is a new type of RNN, where neurons in the same layer are independent of each other and they are connected across layers.
A New Representation of Skeleton Sequences for 3D Action Recognition
Qiuhong Ke,Mohammed Bennamoun,Senjian An,Ferdous Sohel,Farid Boussaid +4 more
- 01 Jul 2017
TL;DR: Wang et al. as mentioned in this paper proposed to use deep convolutional neural networks to learn long-term temporal information of the skeleton sequence from the frames of the generated clips, and then use a Multi-Task Learning Network (MTLN) to jointly process all frames in parallel to incorporate spatial structural information for action recognition.
A New Representation of Skeleton Sequences for 3D Action Recognition
TL;DR: Deep convolutional neural networks are proposed to be used to learn long-term temporal information of the skeleton sequence from the frames of the generated clips, and a Multi-Task Learning Network (MTLN) is proposed to jointly process all Frames of the clips in parallel to incorporate spatial structural information for action recognition.
743
Deep Learning for Spatio-Temporal Data Mining: A Survey
TL;DR: A comprehensive survey on recent progress in applying deep learning techniques for STDM is provided and existing literatures are classified based on the types of spatio-temporal data, the data mining tasks, and the deep learning models.
688
References
ImageNet classification with deep convolutional neural networks
TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
Visualizing and Understanding Convolutional Networks
Matthew D. Zeiler,Rob Fergus +1 more
- 06 Sep 2014
TL;DR: A novel visualization technique is introduced that gives insight into the function of intermediate feature layers and the operation of the classifier in large Convolutional Network models, used in a diagnostic role to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark.
16.6K
Caffe: Convolutional Architecture for Fast Feature Embedding
Yangqing Jia,Evan Shelhamer,Jeff Donahue,Sergey Karayev,Jonathan Long,Ross Girshick,Sergio Guadarrama,Trevor Darrell +7 more
- 03 Nov 2014
TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Learning Spatiotemporal Features with 3D Convolutional Networks
Du Tran,Du Tran,Lubomir Bourdev,Rob Fergus,Lorenzo Torresani,Manohar Paluri +5 more
- 07 Dec 2015
TL;DR: The learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks.
3D Convolutional Neural Networks for Human Action Recognition
TL;DR: Wang et al. as mentioned in this paper developed a novel 3D CNN model for action recognition, which extracts features from both the spatial and the temporal dimensions by performing 3D convolutions, thereby capturing the motion information encoded in multiple adjacent frames.