A robust and efficient method for skeleton-based human action recognition and its application for cross-dataset evaluation
17
TL;DR: TD-Net as mentioned in this paper improves the Double-Feature Double-motion Network (DD-Net) by adding a normalised coordinates of joints (NCJ) branch to enrich the spatial information.
read more
Abstract: Skeleton-based human action recognition has emerged recently thanks to its compactness and robustness to appearance variations. Although impressive results have been obtained in recent years, the performance of skeleton-based action recognition methods has to be improved to be deployed in real-time applications. Recently, a lightweight network structure named Double-feature Double-motion Network (DD-Net) has been proposed for the skeleton-based human action recognition. With high speed, the DD-Net achieves state-of-the-art performance on hand and body actions. The DD-Net could not distinguish actions if they have a weak connection with the global trajectories. However, the DD-Net is suitable for human action recognition where actions strongly correlate to the global trajectories. In this paper, the authors propose TD-Net, an improved version of the DD-Net in which a new branch is added. The new branch takes the normalised coordinates of joints (NCJ) to enrich the spatial information. On five datasets for skeleton-based human activity recognition that are MSR-Action3D, CMDFall, JHMDB, FPHAB, and NTU RGB + D, the TD-Net consistently obtains superior performance compared with the baseline model DD-Net. The proposed method outperforms different state-of-the-art methods, including both hand-designed and deep learning-based methods on four datasets (MSR-Action3D, CMDFall, JHMDB, and FPHAB). Furthermore, the generalisation of the proposed method is confirmed through cross-dataset evaluation. To illustrate the potential use of the model for real-time human action recognition, the authors have deployed an application on an edge device. The experimental result shows that the application can process up to 40 fps for pose estimation using MediaPipe. It takes only 0.04 ms to recognise an action from skeleton sequences.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Sharing-Net: Lightweight feedforward network for skeleton-based action recognition based on information sharing mechanism
TL;DR: This paper proposes Sharing-Net, a lightweight feedforward network for skeleton-based action recognition, utilizing a multi-feature input module and cross-channel information sharing mechanism to enhance accuracy while guaranteeing high speed on various datasets.
7
Hybrid LSTM and GAN model for action recognition and prediction of lawn tennis sport activities
Yong Wang,Jawad Khan +1 more
TL;DR: This paper presents an innovative framework that leverages deep learning, particularly dilated neural networks, for real-time spatio-temporal tennis analysis on standard hardware, aiming to enhance player performance insights and action prediction through TensorFlow.
5
Accurate continuous action and gesture recognition method based on skeleton and sliding windows techniques
Viet Duc Le,Thi-Lich Nghiem,Thi-Lan Le +2 more
- 31 Oct 2023
TL;DR: This paper proposes a method for continuous action recognition that incorporates sliding window technique and a light weight action classification model named DDNet and evaluates the proposed approach on several benchmark datasets.
3
Structure and Sequencing Preserving Representations for Skeleton-based Action Recognition Relying on Attention Mechanisms
TL;DR: A range of representations of skeletal data are proposed and evaluates and contrasts them, first, to introduce distinct ways of simultaneously addressing temporal and spatial aspects, and second, to identify the most effective solution.
2
References
•Proceedings Article
Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition
Sijie Yan,Yuanjun Xiong,Dahua Lin +2 more
- 27 Apr 2018
TL;DR: Wang et al. as discussed by the authors proposed a novel model of dynamic skeletons called Spatial-Temporal Graph Convolutional Networks (ST-GCN), which moves beyond the limitations of previous methods by automatically learning both the spatial and temporal patterns from data.
HMDB: A large video database for human motion recognition
Hilde Kuehne,Hueihan Jhuang,Estibaliz Garrote,Tomaso Poggio,Thomas Serre +4 more
- 06 Nov 2011
TL;DR: This paper uses the largest action video database to-date with 51 action categories, which in total contain around 7,000 manually annotated clips extracted from a variety of sources ranging from digitized movies to YouTube, to evaluate the performance of two representative computer vision systems for action recognition and explore the robustness of these methods under various conditions.
NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis
Amir Shahroudy,Jun Liu,Tian-Tsong Ng,Gang Wang +3 more
- 01 Jun 2016
TL;DR: A large-scale dataset for RGB+D human action recognition with more than 56 thousand video samples and 4 million frames, collected from 40 distinct subjects is introduced and a new recurrent neural network structure is proposed to model the long-term temporal correlation of the features for each body part, and utilize them for better action classification.
Human activity analysis: A review
Jake K. Aggarwal,Michael S. Ryoo +1 more
TL;DR: This article provides a detailed overview of various state-of-the-art research papers on human activity recognition, discussing both the methodologies developed for simple human actions and those for high-level activities.
2.3K
Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition
Lei Shi,Yifan Zhang,Jian Cheng,Hanqing Lu +3 more
- 15 Jun 2019
TL;DR: Zhang et al. as mentioned in this paper proposed a two-stream adaptive graph convolutional network (2s-AGCN) to model both the first-order and the second-order information simultaneously, which shows notable improvement for the recognition accuracy.