Forecasting Hands and Objects in Future Frames

doi:10.1007/978-3-030-11015-4_12

Open AccessBook Chapter10.1007/978-3-030-11015-4_12

Forecasting Hands and Objects in Future Frames

Chenyou Fan, +2 more

- 08 Sep 2018

- pp 124-137

12

TL;DR: In this paper, a two-stream fully convolutional neural network (CNN) architecture is proposed to predict future object presence and location in a video given an image frame, where the intermediate representation of a CNN model abstracts scene information in its frame and can predict such representations corresponding to the future frames based on that of the current frame.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.1109/CVPR.2019.00731

Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction

Osama Makansi, +3 more

- 15 Jun 2019

TL;DR: In this paper, a winner-takes-all loss and an iterative grouping of samples to multiple modes is proposed to predict multimodal distributions of the future states, including the common real scenario.

...read moreread less

179

•Proceedings Article•10.1109/CVPR.2019.00731

Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction.

Osama Makansi, +3 more

- 09 Jun 2019

- arXiv: Computer Vision and Pattern Recog...

TL;DR: This work presents an approach that involves the prediction of several samples of the future with a winner-takes-all loss and iterative grouping of samples to multiple modes and shows on synthetic and real data that the proposed approach triggers good estimates of multimodal distributions and avoids mode collapse.

...read moreread less

132

•Journal Article•10.1016/J.CVIU.2021.103252

Predicting the future from first person (egocentric) vision: A survey

Ivan Rodin, +3 more

- 01 Oct 2021

- Computer Vision and Image Understanding

TL;DR: It is highlighted that methods for future prediction from egocentric vision can have a significant impact in a range of applications and that further research efforts should be devoted to the standardisation of tasks and the proposal of datasets considering real-world scenarios such as the ones with an industrial vocation.

...read moreread less

56

•Proceedings Article•10.1109/CVPR42600.2020.00441

Multimodal Future Localization and Emergence Prediction for Objects in Egocentric View With a Reachability Prior

Osama Makansi, +3 more

- 14 Jun 2020

TL;DR: Experiments show that the reachability prior combined with multi-hypotheses learning improves multimodal prediction of the future location of tracked objects and, for the first time, the emergence of new objects.

...read moreread less

42

Proceedings Article•10.1109/icip46576.2022.9897636

Early Pedestrian Intent Prediction via Features Estimation

16 Oct 2022

TL;DR: In this paper , a model for egocentric action anticipation (RU-LSTM) is proposed to predict pedestrians crossing intentions using a properly attention-based fusion mechanism.

...read moreread less

3

References

•Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

- 01 Jan 2015

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

51.9K

•Book Chapter•10.1007/978-3-319-46448-0_2

SSD: Single Shot MultiBox Detector

Wei Liu, +6 more

- 08 Oct 2016

TL;DR: The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component.

...read moreread less

35.5K

•Book Chapter•10.1007/978-3-319-46448-0_2

SSD: Single Shot MultiBox Detector

Wei Liu, +6 more

- 08 Dec 2015

- arXiv: Computer Vision and Pattern Recog...

TL;DR: SSD as mentioned in this paper discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, and combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes.

...read moreread less

14K

•Proceedings Article•10.1109/CVPR.2016.350

The Cityscapes Dataset for Semantic Urban Scene Understanding

Marius Cordts, +8 more

- 01 Jun 2016

TL;DR: This work introduces Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling, and exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity.

...read moreread less

11.5K

•Proceedings Article

Two-Stream Convolutional Networks for Action Recognition in Videos

Karen Simonyan, +1 more

- 08 Dec 2014

TL;DR: This work proposes a two-stream ConvNet architecture which incorporates spatial and temporal networks and demonstrates that a ConvNet trained on multi-frame dense optical flow is able to achieve very good performance in spite of limited training data.

...read moreread less

8.3K

...

Expand

Forecasting Hands and Objects in Future Frames

Chat with Paper

AI Agents for this Paper

Citations

Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction

Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction.

Predicting the future from first person (egocentric) vision: A survey

Multimodal Future Localization and Emergence Prediction for Objects in Egocentric View With a Reachability Prior

Early Pedestrian Intent Prediction via Features Estimation

References

Very Deep Convolutional Networks for Large-Scale Image Recognition

SSD: Single Shot MultiBox Detector

SSD: Single Shot MultiBox Detector

The Cityscapes Dataset for Semantic Urban Scene Understanding

Two-Stream Convolutional Networks for Action Recognition in Videos

Related Papers (5)

Segmenting the Future

Anticipating Visual Representations from Unlabeled Video

Future Person Localization in First-Person Videos

Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems

Learning Object-Centric Transformation for Video Prediction