Proceedings Article10.1109/ICCV.2019.00400
Weakly Supervised Temporal Action Localization Through Contrast Based Evaluation Networks
Ziyi Liu,Le Wang,Qilin Zhang,Zhanning Gao,Zhenxing Niu,Nanning Zheng,Gang Hua +6 more
- 01 Oct 2019
- pp 3899-3908
TL;DR: The Contrast-based Localization EvaluAtioN Network (CleanNet) is proposed with the new action proposal evaluator, which provides pseudo-supervision by leveraging the temporal contrast in snippet-level action classification predictions, and is an integral part of CleanNet which enables end-to-end training.
read more
Abstract: Weakly-supervised temporal action localization (WS-TAL) is a promising but challenging task with only video-level action categorical labels available during training. Without requiring temporal action boundary annotations in training data, WS-TAL could possibly exploit automatically retrieved video tags as video-level labels. However, such coarse video-level supervision inevitably incurs confusions, especially in untrimmed videos containing multiple action instances. To address this challenge, we propose the Contrast-based Localization EvaluAtioN Network (CleanNet) with our new action proposal evaluator, which provides pseudo-supervision by leveraging the temporal contrast in snippet-level action classification predictions. Essentially, the new action proposal evaluator enforces an additional temporal contrast constraint so that high-evaluation-score action proposals are more likely to coincide with true action instances. Moreover, the new action localization module is an integral part of CleanNet which enables end-to-end training. This is in contrast to many existing WS-TAL methods where action localization is merely a post-processing step. Experiments on THUMOS14 and ActivityNet datasets validate the efficacy of CleanNet against existing state-ofthe- art WS-TAL algorithms.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Weakly-Supervised Action Localization by Generative Attention Modeling
Baifeng Shi,Qi Dai,Yadong Mu,Jingdong Wang +3 more
- 14 Jun 2020
TL;DR: This paper proposes to model the class-agnostic frame-wise probability conditioned on the frame attention using conditional Variational Auto-Encoder (VAE), and demonstrates advantage of the method and effectiveness in handling action-context confusion problem.
CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning
Can Zhang,Meng Cao,Dongming Yang,Jie Chen,Yuexian Zou +4 more
- 01 Jun 2021
TL;DR: Wang et al. as mentioned in this paper proposed to refine the hard snippet representation in feature space, which guides the network to perceive precise temporal boundaries and avoid the temporal interval interruption, and they introduced a Hard Snippet Mining algorithm to locate the potential hard snippets.
Adversarial Background-Aware Loss for Weakly-Supervised Temporal Activity Localization
Kyle Min,Jason J. Corso +1 more
- 23 Aug 2020
TL;DR: This work proposes a novel method for weakly-supervised temporal activity localization called A2CL-PT, which localizes the most salient activities of a video and finds other supplementary activities from non-localized parts of the video.
148
CLAWS: Clustering Assisted Weakly Supervised Learning with Normalcy Suppression for Anomalous Event Detection
TL;DR: The proposed weakly supervised anomaly detection method obtains 83.03% and 89.67% frame-level AUC performance on the UCF Crime and ShanghaiTech datasets respectively, demonstrating its superiority over the existing state-of-the-art algorithms.
146
Learning Causal Temporal Relation and Feature Discrimination for Anomaly Detection
TL;DR: Wang et al. as mentioned in this paper proposed a method that consists of four modules to leverage the effect of the temporal cue and feature discrimination for anomaly detection, where the causal temporal relation module captures local-range temporal dependencies among features to enhance features, and the classifier projects enhanced features to the category space using the causal convolution and further expands the temporal modeling range.
138
References
•Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
- 04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
102.6K
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.
•Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
- 01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
51.9K
You Only Look Once: Unified, Real-Time Object Detection
Joseph Redmon,Santosh K. Divvala,Ross Girshick,Ali Farhadi +3 more
- 27 Jun 2016
TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
•Proceedings Article
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe,Christian Szegedy +1 more
- 06 Jul 2015
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Related Papers (5)
Joao Carreira,Andrew Zisserman +1 more
- 21 Jul 2017
Huijuan Xu,Abir Das,Kate Saenko +2 more
- 01 Oct 2017