Open AccessProceedings Article10.1109/CVPR.2013.98

HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences

- 23 Jun 2013

- pp 716-723

1.1K

TL;DR: A new descriptor for activity recognition from videos acquired by a depth sensor is presented that better captures the joint shape-motion cues in the depth sequence, and thus outperforms the state-of-the-art on all relevant benchmarks.

Abstract: We present a new descriptor for activity recognition from videos acquired by a depth sensor. Previous descriptors mostly compute shape and motion features independently, thus, they often fail to capture the complex joint shape-motion cues at pixel-level. In contrast, we describe the depth sequence using a histogram capturing the distribution of the surface normal orientation in the 4D space of time, depth, and spatial coordinates. To build the histogram, we create 4D projectors, which quantize the 4D space and represent the possible directions for the 4D normal. We initialize the projectors using the vertices of a regular polychoron. Consequently, we refine the projectors using a discriminative density measure, such that additional projectors are induced in the directions where the 4D normals are more dense and discriminative. Through extensive experiments, we demonstrate that our descriptor better captures the joint shape-motion cues in the depth sequence, and thus outperforms the state-of-the-art on all relevant benchmarks.

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Most frequently asked questions

1. What are the contributions in "Hon4d: histogram of oriented 4d normals for activity recognition from depth sequences" ?

The authors present a new descriptor for activity recognition from videos acquired by a depth sensor.. In contrast, the authors describe the depth sequence using a histogram capturing the distribution of the surface normal orientation in the 4D space of time, depth, and spatial coordinates.. To build the histogram, the authors create 4D projectors, which quantize the 4D space and represent the possible directions for the 4D normal.. The authors initialize the projectors using the vertices of a regular polychoron.. Consequently, the authors refine the projectors using a discriminative density measure, such that additional projectors are induced in the directions where the 4D normals are more dense and discriminative.. Through extensive experiments, the authors demonstrate that their descriptor better captures the joint shape-motion cues in the depth sequence, and thus outperforms the state-of-the-art on all relevant benchmarks.

Figure 1. Surface normals overlayed on three examples from MSR Actions 3D dataset [12]. The surface normals capture the shape cues at a specific time instance, while the change in the surface normal over time captures the motion cues. In this paper, we use 4D normals computed in the space of depth, time, and spatial coordinates in order to obtain rich descriptors of activities. Note that in the figure we illustrate 3D surface normals since it is difficult to visualize the 4D normals used in the paper.

Table 1. The performance of our method on MSR Action 3D dataset, compared to previous approaches.

Figure 4. Example frames from different actions obtained from MSR Action 3D dataset [12], MSR Hand Gesture dataset [23], and MSR Daily Activity 3D [24].

Table 2. The performance of our method on MSR Hand Gesture 3D dataset, compared to previous approaches.

Table 3. The performance of our method on 3D action pairs dataset, compared to previous approaches.

Figure 5. The confusion tables for 3D Action Pairs dataset. Top: Pair-wise skeleton features and LOP features from [24] without temporal pyramid (left), and with pyramid (right). Bottom: HON4D features as is (left), and after refining the projectors using the discriminative density (right).

Citations

Proceedings Article•10.1109/CGVIS.2015.7449900

Action recognition for human robot interaction in industrial applications

Sharath Chandra Akkaladevi, +1 more

- 01 Nov 2015

TL;DR: A set of key descriptors are learned from a collection of weak spatio-temporal skeletal joint descriptors using random forests, which reduces the dimensionality and computational effort, and it is shown that this approach reduces the descriptor dimensionality by 61 percent.

...read moreread less

Proceedings Article•10.1109/CVPRW.2017.203

Human Activity Recognition Using Combinatorial Deep Belief Networks

Shreyank N Gowda

- 01 Jul 2017

TL;DR: This paper proposes an approach to human activity recognition using a combination of deep belief networks, and proposes the modification of the standard local binary patterns descriptor to obtain a concatenated histogram of lower dimensions.

...read moreread less

Journal Article•10.1109/tpami.2022.3177813

MMNet: A Model-based Multimodal Network for Human Action Recognition in RGB-D Videos

01 Jan 2022

- IEEE Transactions on Pattern Analysis an...

TL;DR: Zhang et al. as mentioned in this paper proposed a model-based multimodal network (MMNet) that fuses skeleton and RGB modalities via a modelbased approach to improve ensemble recognition accuracy by effectively applying mutually complementary information from different data modalities.

...read moreread less

Journal Article•10.1109/tpami.2022.3161735

Point Spatio-Temporal Transformer Networks for Point Cloud Video Modeling

01 Feb 2023

- IEEE Transactions on Pattern Analysis an...

TL;DR: Wang et al. as mentioned in this paper proposed a Point Spatio-Temporal Transformer (PST-Transformer) that adaptively searches related or similar points across the entire video by performing self-attention on point features.

...read moreread less

Journal Article•10.1016/J.IMAVIS.2016.11.004

Sparse composition of body poses and atomic actions for human activity recognition in RGB-D videos

Ivan Lillo, +2 more

- 01 Mar 2017

- Image and Vision Computing

TL;DR: In this paper, a hierarchical compositional model is proposed to recognize human activities using body poses estimated from RGB-D data, where geometric and motion descriptors are used to learn a dictionary of body poses and sparse compositions of these body poses are used for atomic human actions.

...read moreread less

...

Expand

References

Journal Article•10.1023/B:VISI.0000029664.99615.94

Distinctive Image Features from Scale-Invariant Keypoints

David G. Lowe

- 01 Nov 2004

- International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

59.3K

•Proceedings Article•10.1109/CVPR.2005.177

Histograms of oriented gradients for human detection

Navneet Dalal, +1 more

- 20 Jun 2005

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.

...read moreread less

36.7K

•Journal Article•10.1109/TPAMI.2009.167

Object Detection with Discriminatively Trained Part-Based Models

Pedro F. Felzenszwalb, +3 more

- 01 Sep 2010

- IEEE Transactions on Pattern Analysis an...

TL;DR: An object detection system based on mixtures of multiscale deformable part models that is able to represent highly variable object classes and achieves state-of-the-art results in the PASCAL object detection challenges is described.

...read moreread less

11.9K

•Proceedings Article•10.1109/CVPR.2011.5995316

Real-time human pose recognition in parts from single depth images

Jamie Shotton, +7 more

- 20 Jun 2011

TL;DR: This work takes an object recognition approach, designing an intermediate body parts representation that maps the difficult pose estimation problem into a simpler per-pixel classification problem, and generates confidence-scored 3D proposals of several body joints by reprojecting the classification result and finding local modes.

...read moreread less

4.9K

•Journal Article•10.1007/S11263-005-1838-7

On Space-Time Interest Points

Ivan Laptev

- 01 Sep 2005

TL;DR: This paper builds on the idea of the Harris and Förstner interest point operators and detects local structures in space-time where the image values have significant local variations in both space and time and illustrates how a video representation in terms of local space- time features allows for detection of walking people in scenes with occlusions and dynamic cluttered backgrounds.

...read moreread less

3.6K

...

Expand

HON4D: Histogram of Oriented 4D Normals for Activity Recognition from Depth Sequences

Chat with Paper

AI Agents for this Paper

Most frequently asked questions

1. What are the contributions in "Hon4d: histogram of oriented 4d normals for activity recognition from depth sequences" ?

Figures

Citations

Action recognition for human robot interaction in industrial applications

Human Activity Recognition Using Combinatorial Deep Belief Networks

MMNet: A Model-based Multimodal Network for Human Action Recognition in RGB-D Videos

Point Spatio-Temporal Transformer Networks for Point Cloud Video Modeling

Sparse composition of body poses and atomic actions for human activity recognition in RGB-D videos

References

Distinctive Image Features from Scale-Invariant Keypoints

Histograms of oriented gradients for human detection

Object Detection with Discriminatively Trained Part-Based Models

Real-time human pose recognition in parts from single depth images

On Space-Time Interest Points

Related Papers (5)

Action recognition based on a bag of 3D points

Mining actionlet ensemble for action recognition with depth cameras

View invariant human action recognition using histograms of 3D joints

Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group

Hierarchical recurrent neural network for skeleton based action recognition