Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning

Open AccessPosted Content

Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning

- 06 Dec 2016

- arXiv: Computer Vision and Pattern Recog...

1.2K

TL;DR: This article proposed an adaptive attention model with a visual sentinel to decide whether to attend to the image and where, in order to extract meaningful information for sequential word generation, which set the new state-of-the-art by a significant margin.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Posted Content

simNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions

Fenglin Liu, +4 more

- 27 Aug 2018

- arXiv: Computation and Language

TL;DR: Zhang et al. as discussed by the authors proposed the stepwise image-topic merging network (simNet) that makes use of the two kinds of attention at the same time. And the decoder adaptively merges the attentive information in the extracted topics and the image according to the generated context, so that the visual information and the semantic information can be effectively combined.

...read moreread less

28

•Proceedings Article•10.18653/V1/K17-1044

Natural Language Generation for Spoken Dialogue System using RNN Encoder-Decoder Networks

Van-Khanh Tran, +1 more

- 01 Jun 2017

- arXiv: Computation and Language

TL;DR: In this paper, a Recurrent Neural Network based Encoder-Decoder (RNN-decoder) architecture was proposed to generate natural language sentences in spoken dialogue systems. But the proposed generator can be jointly trained both sentence planning and surface realization.

...read moreread less

28

•Proceedings Article

h-detach: Modifying the LSTM Gradient Towards Better Optimization

Devansh Arpit, +5 more

- 27 Sep 2018

TL;DR: In this paper, a stochastic algorithm called H-detach was proposed to prevent the vanishing gradient problem in LSTM by suppressing the gradient components through the linear path (cell state) in the computational graph, which can prevent LSTMs from capturing long-term dependencies.

...read moreread less

27

Journal Article•10.1007/s11063-022-10759-z

A New Attention-Based LSTM for Image Captioning

Fen Xiao, +2 more

- 14 Feb 2022

- Neural Processing Letters

TL;DR: An attentional LSTM (ALSTM) is proposed and how to integrate it within state-of-the-art automatic image captioning framework is shown and shows how to get effective visual/context attention to update input vector.

...read moreread less

27

•Posted Content

Interpretable Counting for Visual Question Answering

Alexander Trott, +2 more

- 23 Dec 2017

- arXiv: Artificial Intelligence

TL;DR: The model sequentially selects from detected objects and learns interactions between objects that influence subsequent selections and outperforms the state of the art architecture for VQA on multiple metrics that evaluate counting.

...read moreread less

27

...

Expand

References

•Proceedings Article•10.1109/CVPR.2016.90

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

198.7K

•Book Chapter•10.1007/978-3-319-10602-1_48

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, +7 more

- 06 Sep 2014

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.

...read moreread less

51.7K

•Proceedings Article•10.3115/1073083.1073135

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

- 06 Jul 2002

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

...read moreread less

28.9K

•Proceedings Article•10.3115/V1/D14-1179

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

Kyunghyun Cho, +8 more

- 01 Jan 2014

TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.

...read moreread less

28.6K

•Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

- 01 Jan 2015

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

25.7K

...

Expand

Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning

Chat with Paper

AI Agents for this Paper

Citations

simNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions

Natural Language Generation for Spoken Dialogue System using RNN Encoder-Decoder Networks

h-detach: Modifying the LSTM Gradient Towards Better Optimization

A New Attention-Based LSTM for Image Captioning

Interpretable Counting for Visual Question Answering

References

Deep Residual Learning for Image Recognition

Microsoft COCO: Common Objects in Context

Bleu: a Method for Automatic Evaluation of Machine Translation

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

Neural Machine Translation by Jointly Learning to Align and Translate

Related Papers (5)

Bleu: a Method for Automatic Evaluation of Machine Translation

Long short-term memory

Very Deep Convolutional Networks for Large-Scale Image Recognition

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization