Large-scale Continuous Gesture Recognition Using Convolutional Neural Networks
Pichao Wang,Wanqing Li,Song Liu,Yuyao Zhang,Zhimin Gao,Philip Ogunbona +5 more
- 01 Dec 2016
- pp 13-18
TL;DR: This paper addresses the problem of continuous gesture recognition from sequences of depth maps using Convolutional Neural networks (ConvNets) and first segments individual gestures from a depth sequence based on quantity of movement (QOM).
read more
Abstract: This paper addresses the problem of continuous gesture recognition from sequences of depth maps using Convolutional Neural networks (ConvNets). The proposed method first segments individual gestures from a depth sequence based on quantity of movement (QOM). For each segmented gesture, an Improved Depth Motion Map (IDMM), which converts the depth sequence into one image, is constructed and fed to a ConvNet for recognition. The IDMM effectively encodes both spatial and temporal information and allows the fine-tuning with existing ConvNet models for classification without introducing millions of parameters to learn. The proposed method is evaluated on the Large-scale Continuous Gesture Recognition of the ChaLearn Looking at People (LAP) challenge 2016. It achieved the performance of 0.2655 (Mean Jaccard Index) and ranked 3rd place in this challenge.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

TABLE III: Comparsion the performances of the first three winners in this challenge. Our team ranks the third place in the ICPR ChaLearn LAP challenge 2016. 
TABLE II: Accuracies of the proposed method and baseline methods on the ChaLearn LAP ConGD Dataset. 
Fig. 3: The sample depth maps from three sequences, each containing 5 different gestures. Each row corresponds to one depth video sequence. The labels from top left to bottom right are: (a) CraneHandSignals/EverythingSlow;(b) RefereeVolleyballSignals2/BallServedIntoNetPlayerTouchingNet; (c) GestunoDisaster/110 earthquake tremblementdeterre;(d) DivingSignals2/You; (e) RefereeVolleyballSignals2/BallServedIntoNetPlayerTouchingNet;(f) Mudra2/Sandamsha; (g) Mudra2/Sandamsha;(h) DivingSignals2/CannotOpenReserve;(i) GestunoTopography/95 region region; (j) DivingSignals2/Meet;(k) RefereeVolleyballSignals1/Timeout;(l) SwatHandSignals1/DogNeeded; (m) DivingSignals2/ReserveOpened;(n) DivingSignals1/ComeHere; (o) DivingSignals1/Watch, SwatHandSignals2/LookSearch. 
Fig. 1: The framework for proposed method.
Citations
EgoGesture: A New Dataset and Benchmark for Egocentric Hand Gesture Recognition
TL;DR: A new benchmark dataset named EgoGesture is introduced with sufficient size, variation, and reality to be able to train deep neural networks and provides an in-depth analysis on input modality selection and domain adaptation between different scenes.
337
Scene Flow to Action Map: A New Representation for RGB-D Based Action Recognition with Convolutional Neural Networks
Pichao Wang,Wanqing Li,Zhimin Gao,Yuyao Zhang,Chang Tang,Philip Ogunbona +5 more
- 01 Jul 2017
TL;DR: A new representation, namely, Scene Flow to Action Map (SFAM), that describes several long term spatio-temporal dynamics for action recognition from RGB-D data and takes better advantage of the trained ConvNets models over ImageNet.
•Posted Content
RGB-D-based Human Motion Recognition with Deep Learning: A Survey
TL;DR: A detailed overview of recent advances in RGB-D-based motion recognition is presented in this paper, where the reviewed methods are broadly categorized into four groups, depending on the modality adopted for recognition: RGB-based, depth based, skeleton-based and RGB+D based.
159
Egocentric Gesture Recognition Using Recurrent 3D Convolutional Neural Networks with Spatiotemporal Transformer Modules
Congqi Cao,Yifan Zhang,Yi Wu,Hanqing Lu,Jian Cheng +4 more
- 01 Oct 2017
TL;DR: A novel recurrent 3D convolutional neural network with recurrent connections between neighboring time slices which can actively transform a 3D feature map into a canonical view in both spatial and temporal dimensions is designed.
132
MultiD-CNN: A multi-dimensional feature learning approach based on deep convolutional networks for gesture recognition in RGB-D image sequences
TL;DR: An effective multi-dimensional feature learning approach, termed as MultiD-CNN, for human gesture recognition in RGB-D videos is presented, demonstrating that this approach is particularly impressive where it outperforms prior arts in both accuracy and efficiency.
131
References
ImageNet classification with deep convolutional neural networks
TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
•Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton +2 more
- 03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Caffe: Convolutional Architecture for Fast Feature Embedding
Yangqing Jia,Evan Shelhamer,Jeff Donahue,Sergey Karayev,Jonathan Long,Ross Girshick,Sergio Guadarrama,Trevor Darrell +7 more
- 03 Nov 2014
TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
•Posted Content
Caffe: Convolutional Architecture for Fast Feature Embedding
Yangqing Jia,Evan Shelhamer,Jeff Donahue,Sergey Karayev,Jonathan Long,Ross Girshick,Sergio Guadarrama,Trevor Darrell +7 more
TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
13.1K
Learning Spatiotemporal Features with 3D Convolutional Networks
Du Tran,Du Tran,Lubomir Bourdev,Rob Fergus,Lorenzo Torresani,Manohar Paluri +5 more
- 07 Dec 2015
TL;DR: The learned features, namely C3D (Convolutional 3D), with a simple linear classifier outperform state-of-the-art methods on 4 different benchmarks and are comparable with current best methods on the other 2 benchmarks.
Related Papers (5)
Karen Simonyan,Andrew Zisserman +1 more
- 08 Dec 2014
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016