Sequential Memory Modelling for Video Captioning

doi:10.1109/indicon56171.2022.10039829

Proceedings Article10.1109/indicon56171.2022.10039829

Sequential Memory Modelling for Video Captioning

24 Nov 2022

TL;DR: In this paper , an encoder-decoder network end-in-frame based on a deep learning approach was used to generate video subtitles, and the model, dataset and parameters used to evaluate the model.

Abstract: In recent years, the automatic generation of natural language descriptions of video has focused on deep learning research and natural voice processing. Video understanding has multiple applications such as video search and indexing, but video subtitles are a correct sophisticated topic for complex and diverse types of video content. However, the understanding between video and natural language sets remains an open issue to better understand the video and create multiple methods to create a set automatically. The deep learning method has a major focus on the direction of video processing with performance and high-speed computing capabilities. This polling discusses an encoder-decoder network end-in-frame based on a deep learning approach to generate caption. In this paper we will describe the model, dataset and parameters used to evaluate the model.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

References

•Proceedings Article•10.1109/CVPR.2015.7298935

Show and tell: A neural image caption generator

Oriol Vinyals, +3 more

- 07 Jun 2015

TL;DR: In this paper, a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation is proposed to generate natural sentences describing an image, which can be used to automatically describe the content of an image.

...read moreread less

7.5K

Proceedings Article•10.1109/CVPR.2016.571

MSR-VTT: A Large Video Description Dataset for Bridging Video and Language

Jun Xu, +3 more

- 01 Jun 2016

TL;DR: A detailed analysis of MSR-VTT in comparison to a complete set of existing datasets, together with a summarization of different state-of-the-art video-to-text approaches, shows that the hybrid Recurrent Neural Networkbased approach, which combines single-frame and motion representations with soft-attention pooling strategy, yields the best generalization capability on this dataset.

...read moreread less

1.5K

•Proceedings Article•10.1109/ICCV.2019.00473

Attention on Attention for Image Captioning

Lun Huang, +3 more

- 01 Oct 2019

TL;DR: AoANet as mentioned in this paper proposes an Attention on Attention (AoA) module, which extends the conventional attention mechanisms to determine the relevance between attention results and queries and achieves state-of-the-art performance.

...read moreread less

1.1K

•Journal Article•10.1145/3295748

A Comprehensive Survey of Deep Learning for Image Captioning

Md. Zakir Hossain, +3 more

- 04 Feb 2019

- ACM Computing Surveys

TL;DR: A comprehensive review of deep learning-based image captioning techniques can be found in this article, where the authors discuss the foundation of the techniques to analyze their performances, strengths, and limitations.

...read moreread less

934

Journal Article•10.1109/TCYB.2018.2831447

Describing Video With Attention-Based Bidirectional LSTM

Yi Bin, +5 more

- 01 Jul 2019

- IEEE Transactions on Systems, Man, and C...

TL;DR: A novel video captioning framework, which integrates bidirectional long-short term memory (BiLSTM) and a soft attention mechanism to generate better global representations for videos as well as enhance the recognition of lasting motions in videos.

...read moreread less

276

...

Expand

Sequential Memory Modelling for Video Captioning

Chat with Paper

AI Agents for this Paper

References

Show and tell: A neural image caption generator

MSR-VTT: A Large Video Description Dataset for Bridging Video and Language

Attention on Attention for Image Captioning

A Comprehensive Survey of Deep Learning for Image Captioning

Describing Video With Attention-Based Bidirectional LSTM

Related Papers (5)

Automatic Video Summary and Description

Segment Based Indexing Technique for Video Data File

Entity Detection for Information Retrieval in Video Streams

Practical video indexing and retrieval system

Object Segmentation in Video Sequences by using Single Frame Processing