Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning

Open AccessPosted Content

Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning

- 06 Dec 2016

- arXiv: Computer Vision and Pattern Recog...

1.2K

TL;DR: This article proposed an adaptive attention model with a visual sentinel to decide whether to attend to the image and where, in order to extract meaningful information for sequential word generation, which set the new state-of-the-art by a significant margin.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.1109/ICCV.2019.00435

Learning to Collocate Neural Modules for Image Captioning

Xu Yang, +2 more

- 01 Oct 2019

TL;DR: Zhang et al. as discussed by the authors proposed learning to locate neural modules to generate the ''inner pattern'' connecting visual encoder and language decoder, which achieved state-of-the-art image captioning performance.

...read moreread less

102

•Posted Content

Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering

Gao Peng, +6 more

- 13 Dec 2018

- arXiv: Computer Vision and Pattern Recog...

TL;DR: A novel method of dynamically fuse multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and language modalities is proposed, which can robustly capture the high-level interactions between language and vision domains.

...read moreread less

102

•Posted Content

ARGAN: Attentive Recurrent Generative Adversarial Network for Shadow Detection and Removal

Bin Ding, +3 more

- 04 Aug 2019

- arXiv: Computer Vision and Pattern Recog...

TL;DR: Zhang et al. as discussed by the authors proposed an attentive recurrent generative adversarial network (ARGAN) to detect and remove shadows in an image, which consists of multiple progressive steps and a discriminator is designed to classify whether the output image in the last progressive step is real or fake.

...read moreread less

101

Journal Article•10.1109/TIP.2018.2889922

Topic-Oriented Image Captioning Based on Order-Embedding

Niange Yu, +4 more

- 01 Jun 2019

- IEEE Transactions on Image Processing

TL;DR: Experiments on the image captioning task on the MS-COCO and Flickr30K datasets validate the usefulness of this framework by showing that the different given topics can lead to different captions describing specific aspects of the given image and that the quality of generated captions is higher than the control model without a topic as input.

...read moreread less

101

Journal Article•10.1016/J.NEUCOM.2019.09.086

SARPNET: Shape attention regional proposal network for liDAR-based 3D object detection

Ye Yangyang, +4 more

- 28 Feb 2020

- Neurocomputing

TL;DR: A novel 3D object detection network called SARPNET is introduced, which deploys a new low-level feature encoder to remedy the sparsity and inhomogeneity of LiDAR point clouds with an even sample method, and embodies a shape attention mechanism that learns the statistic 3D shape priors of objects and uses them to spatially enhance semantic embeddings.

...read moreread less

101

...

Expand

References

•Proceedings Article•10.1109/CVPR.2016.90

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

198.7K

•Book Chapter•10.1007/978-3-319-10602-1_48

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, +7 more

- 06 Sep 2014

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.

...read moreread less

51.7K

•Proceedings Article•10.3115/1073083.1073135

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

- 06 Jul 2002

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

...read moreread less

28.9K

•Proceedings Article•10.3115/V1/D14-1179

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

Kyunghyun Cho, +8 more

- 01 Jan 2014

TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.

...read moreread less

28.6K

•Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

- 01 Jan 2015

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

25.7K

...

Expand

Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning

Chat with Paper

AI Agents for this Paper

Citations

Learning to Collocate Neural Modules for Image Captioning

Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering

ARGAN: Attentive Recurrent Generative Adversarial Network for Shadow Detection and Removal

Topic-Oriented Image Captioning Based on Order-Embedding

SARPNET: Shape attention regional proposal network for liDAR-based 3D object detection

References

Deep Residual Learning for Image Recognition

Microsoft COCO: Common Objects in Context

Bleu: a Method for Automatic Evaluation of Machine Translation

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

Neural Machine Translation by Jointly Learning to Align and Translate

Related Papers (5)

Bleu: a Method for Automatic Evaluation of Machine Translation

Long short-term memory

Very Deep Convolutional Networks for Large-Scale Image Recognition

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization