Deep Learning Driven Visual Path Prediction From a Single Image
TL;DR: Wang et al. as discussed by the authors proposed a deep learning framework, which simultaneously performs deep feature learning for visual representation in conjunction with spatiotemporal context modeling, and a unified path-planning scheme is proposed to make accurate path prediction based on the analytic results returned by the deep context models.
read more
Abstract: Capabilities of inference and prediction are the significant components of visual systems. Visual path prediction is an important and challenging task among them, with the goal to infer the future path of a visual object in a static scene. This task is complicated as it needs high-level semantic understandings of both the scenes and underlying motion patterns in video sequences. In practice, cluttered situations have also raised higher demands on the effectiveness and robustness of models. Motivated by these observations, we propose a deep learning framework, which simultaneously performs deep feature learning for visual representation in conjunction with spatiotemporal context modeling. After that, a unified path-planning scheme is proposed to make accurate path prediction based on the analytic results returned by the deep context models. The highly effective visual representation and deep context models ensure that our framework makes a deep semantic understanding of the scenes and motion patterns, consequently improving the performance on visual path prediction task. In experiments, we extensively evaluate the model’s performance by constructing two large benchmark datasets from the adaptation of video tracking datasets. The qualitative and quantitative experimental results show that our approach outperforms the state-of-the-art approaches and owns a better generalization capability.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Human motion trajectory prediction: a survey:
Andrey Rudenko,Andrey Rudenko,Luigi Palmieri,Michael Herman,Kris M. Kitani,Dariu M. Gavrila,Kai O. Arras +6 more
TL;DR: In this article, the ability of intelligent autonomous systems to perceive, understand, and anticipate human behavior becomes increasingly important in a growing number of intelligent systems in human environments, and the ability to do so is discussed.
821
Human Motion Trajectory Prediction: A Survey
Andrey Rudenko,Andrey Rudenko,Luigi Palmieri,Michael Herman,Kris M. Kitani,Dariu M. Gavrila,Kai O. Arras +6 more
TL;DR: A survey of human motion trajectory prediction can be found in this article, where the authors provide an overview of the existing datasets and performance metrics and discuss limitations of the state-of-the-art and outline directions for further research.
Future Person Localization in First-Person Videos
Takuma Yagi,Karttikeya Mangalam,Ryo Yonetani,Yoichi Sato +3 more
- 18 Jun 2018
TL;DR: A new task that predicts future locations of people observed in first-person videos by incorporating a prediction framework with a multi-stream convolution-deconvolution architecture that is effective on a new dataset as well as on a public social interaction dataset.
Context-Aware Trajectory Prediction
Federico Bartoli,Giuseppe Lisanti,Lamberto Ballan,Alberto Del Bimbo +3 more
- 01 Aug 2018
TL;DR: In this article, a context-aware recurrent neural network LSTM model is proposed to predict human motion in crowded spaces such as a sidewalk, a museum or a shopping mall.
198
Overcoming Limitations of Mixture Density Networks: A Sampling and Fitting Framework for Multimodal Future Prediction
Osama Makansi,Eddy Ilg,Özgün Çiçek,Thomas Brox +3 more
- 15 Jun 2019
TL;DR: In this paper, a winner-takes-all loss and an iterative grouping of samples to multiple modes is proposed to predict multimodal distributions of the future states, including the common real scenario.
References
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
99K
•Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton +2 more
- 03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
Ross Girshick,Jeff Donahue,Trevor Darrell,Jitendra Malik +3 more
- 23 Jun 2014
TL;DR: RCNN as discussed by the authors combines CNNs with bottom-up region proposals to localize and segment objects, and when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.
Visualizing and Understanding Convolutional Networks
Matthew D. Zeiler,Rob Fergus +1 more
- 06 Sep 2014
TL;DR: A novel visualization technique is introduced that gives insight into the function of intermediate feature layers and the operation of the classifier in large Convolutional Network models, used in a diagnostic role to find model architectures that outperform Krizhevsky et al on the ImageNet classification benchmark.
16.6K
Caffe: Convolutional Architecture for Fast Feature Embedding
Yangqing Jia,Evan Shelhamer,Jeff Donahue,Sergey Karayev,Jonathan Long,Ross Girshick,Sergio Guadarrama,Trevor Darrell +7 more
- 03 Nov 2014
TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Related Papers (5)
[...]
Kris M. Kitani,Brian D. Ziebart,James Andrew Bagnell,Martial Hebert +3 more
- 07 Oct 2012
[...]