Open AccessPosted Content
Image Captioning based on Deep Learning Methods: A Survey.
TL;DR: A survey on advances in image captioning based on Deep Learning methods, including Encoder-Decoder structure, improved methods in Encoder,Improved methods in Decoder, and other improvements is presented.
read more
Abstract: Image captioning is a challenging task and attracting more and more attention in the field of Artificial Intelligence, and which can be applied to efficient image retrieval, intelligent blind guidance and human-computer interaction, etc. In this paper, we present a survey on advances in image captioning based on Deep Learning methods, including Encoder-Decoder structure, improved methods in Encoder, improved methods in Decoder, and other improvements. Furthermore, we discussed future research directions.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Semantic interdisciplinary evaluation of image captioning models
Uddagiri Sirisha,B. Sai Chandana +1 more
TL;DR: In this article , the authors examine and analyze different image captioning models used across various domains, and multiple insights are extracted to determine the best combinational architecture for a new application without ignoring contextual semantics.
17
Facilitated Deep Learning Models for Image Captioning
Imtinan Azhar,Imad Afyouni,Ashraf Elnagar +2 more
- 24 Mar 2021
TL;DR: In this paper, a mixture of object detection and attention-enriched deep learning models is used to extract the image features, and then an extended version of Recurrent Neural Networks (LSTM) with attention-enhanced features is adopted to generate the caption.
11
A Survey on Attention-Based Models for Image Captioning
Asmaa A. E. Osman,Mohamed A. Wahby Shalaby,Mona M. Soliman,Khaled M. F. Elsayed +3 more
TL;DR: A survey on attention-based models for image captioning is presented in this article , including new categories that were not included in other survey papers, and all categories and subcategories of the attentionbased approaches are discussed in detail.
An Enhanced Hybrid Deep Learning Model for Efficient Automatic Image Captioning
Eliyah Immanuel Thavaraj A,Sujitha Juliet,Anila Sharon J +2 more
- 18 Aug 2023
TL;DR: This paper has utilized the attention mechanism to generate captions for images after considering the recognized items in the image scene and the semantic similarity analysis between produced descriptions and the actual image description is carried out.
1
Sequential Memory Modelling for Video Captioning
24 Nov 2022
TL;DR: In this paper , an encoder-decoder network end-in-frame based on a deep learning approach was used to generate video subtitles, and the model, dataset and parameters used to evaluate the model.
References
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
99K
Microsoft COCO: Common Objects in Context
Tsung-Yi Lin,Michael Maire,Serge Belongie,James Hays,Pietro Perona,Deva Ramanan,Piotr Dollár,C. Lawrence Zitnick +7 more
- 06 Sep 2014
TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Bleu: a Method for Automatic Evaluation of Machine Translation
Kishore Papineni,Salim Roukos,Todd Ward,Wei-Jing Zhu +3 more
- 06 Jul 2002
TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
•Posted Content
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.
25.3K
•Proceedings Article
Sequence to Sequence Learning with Neural Networks
Ilya Sutskever,Oriol Vinyals,Quoc V. Le +2 more
- 08 Dec 2014
TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.
Related Papers (5)
Imane Allaouzi,M. Ben Ahmed,B. Benamrou,Mustapha Ouardouz +3 more
- 10 Oct 2018
Sai Siddarth Yv,Yogesh Choubey,Dinesh Naik +2 more
- 08 Apr 2021
Lalitha B,Gomathi +1 more
- 01 Feb 2019