Proceedings Article10.1109/indicon56171.2022.10039829
Sequential Memory Modelling for Video Captioning
24 Nov 2022
TL;DR: In this paper , an encoder-decoder network end-in-frame based on a deep learning approach was used to generate video subtitles, and the model, dataset and parameters used to evaluate the model.
read more
Abstract: In recent years, the automatic generation of natural language descriptions of video has focused on deep learning research and natural voice processing. Video understanding has multiple applications such as video search and indexing, but video subtitles are a correct sophisticated topic for complex and diverse types of video content. However, the understanding between video and natural language sets remains an open issue to better understand the video and create multiple methods to create a set automatically. The deep learning method has a major focus on the direction of video processing with performance and high-speed computing capabilities. This polling discusses an encoder-decoder network end-in-frame based on a deep learning approach to generate caption. In this paper we will describe the model, dataset and parameters used to evaluate the model.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
References
Show and tell: A neural image caption generator
Oriol Vinyals,Alexander Toshev,Samy Bengio,Dumitru Erhan +3 more
- 07 Jun 2015
TL;DR: In this paper, a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation is proposed to generate natural sentences describing an image, which can be used to automatically describe the content of an image.
MSR-VTT: A Large Video Description Dataset for Bridging Video and Language
TL;DR: A detailed analysis of MSR-VTT in comparison to a complete set of existing datasets, together with a summarization of different state-of-the-art video-to-text approaches, shows that the hybrid Recurrent Neural Networkbased approach, which combines single-frame and motion representations with soft-attention pooling strategy, yields the best generalization capability on this dataset.
Attention on Attention for Image Captioning
Lun Huang,Wenmin Wang,Jie Chen,Xiao-Yong Wei +3 more
- 01 Oct 2019
TL;DR: AoANet as mentioned in this paper proposes an Attention on Attention (AoA) module, which extends the conventional attention mechanisms to determine the relevance between attention results and queries and achieves state-of-the-art performance.
A Comprehensive Survey of Deep Learning for Image Captioning
TL;DR: A comprehensive review of deep learning-based image captioning techniques can be found in this article, where the authors discuss the foundation of the techniques to analyze their performances, strengths, and limitations.
934
Describing Video With Attention-Based Bidirectional LSTM
TL;DR: A novel video captioning framework, which integrates bidirectional long-short term memory (BiLSTM) and a soft attention mechanism to generate better global representations for videos as well as enhance the recognition of lasting motions in videos.
276
Related Papers (5)
Sanghee Lee,Kang-Hyun Jo +1 more
- 15 Aug 2018
Yiqing Liang,Wayne Wolf,Bede Liu,Jeffrey Huang +3 more
- 01 Mar 1998