End-to-End Learning of Visual Representations From Uncurated Instructional Videos

doi:10.1109/CVPR42600.2020.00990

Open AccessProceedings Article10.1109/CVPR42600.2020.00990

End-to-End Learning of Visual Representations From Uncurated Instructional Videos

Antoine Miech, +5 more

- 14 Jun 2020

- pp 9879-9889

936

TL;DR: This work proposes a new learning approach, MIL-NCE, capable of addressing mis- alignments inherent in narrated videos and outperforms all published self-supervised approaches for these tasks as well as several fully supervised baselines.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1109/wacv57701.2024.00253

M<sup>3</sup>3D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding

Muhammad Abdullah Jamal, +1 more

- 03 Jan 2024

Journal Article•10.1007/978-3-031-72980-5_24

RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos

A Ghamari Zare, +3 more

- 01 Jan 2024

- Lecture Notes in Computer Science

Journal Article•10.1109/TIP.2023.3279978

Contrastive Learning of Person-Independent Representations for Facial Action Unit Detection

Yong Li, +1 more

- 31 May 2023

- IEEE Transactions on Image Processing

TL;DR: In this paper , a margin-based temporal contrastive learning paradigm was adopted to perceive the temporal AU coherence and evolution characteristics within a clip that consists of consecutive input facial frames.

...read moreread less

Journal Article•10.1007/s11263-024-02272-8

Achieving Procedure-Aware Instructional Video Correlation Learning Under Weak Supervision from a Collaborative Perspective

Tianyao He, +8 more

- 04 Nov 2024

- International Journal of Computer Vision

Proceedings Article•10.1109/iccv51070.2023.01427

Exploring Temporal Concurrency for Video-Language Representation Learning

Heng Zhang, +4 more

- 01 Oct 2023

TL;DR: This paper proposes to learn video-language representations by modeling video-language pairs as Temporal Concurrent Processes (TCP) via a process-wised distance metric learning framework and introduces a regularization term that enforces the embeddings of each modality approximating a stochastic process to guarantee the inherent dynamics.

...read moreread less

...

Expand

References

•Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

138.5K

•Posted Content

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 10 Dec 2015

- arXiv: Computer Vision and Pattern Recog...

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

...read moreread less

117.9K

Journal Article•10.1162/NECO.1997.9.8.1735

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997

- Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

99K

•Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- 12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

94.2K

•Posted Content

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 22 Dec 2014

- arXiv: Learning

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.

...read moreread less

82.5K

...

Expand

End-to-End Learning of Visual Representations From Uncurated Instructional Videos

Chat with Paper

AI Agents for this Paper

Citations

M<sup>3</sup>3D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding

RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos

Contrastive Learning of Person-Independent Representations for Facial Action Unit Detection

Achieving Procedure-Aware Instructional Video Correlation Learning Under Weak Supervision from a Collaborative Perspective

Exploring Temporal Concurrency for Video-Language Representation Learning

References

Adam: A Method for Stochastic Optimization

Deep Residual Learning for Image Recognition

Long short-term memory

Attention is All you Need

Adam: A Method for Stochastic Optimization

Related Papers (5)

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Deep Residual Learning for Image Recognition

Attention is All you Need

Representation Learning with Contrastive Predictive Coding