Semantic Compositional Networks for Visual Captioning

Open AccessPosted Content

Semantic Compositional Networks for Visual Captioning

- 23 Nov 2016

- arXiv: Computer Vision and Pattern Recog...

285

TL;DR: Experimental results show that the proposed method significantly outperforms prior state-of-the-art approaches, across multiple evaluation metrics.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Figure 1: Model architecture and illustration of semantic composition. Each triangle symbol represents an ensemble of tag-dependent weight matrices. The number next to a semantic concept (i.e., a tag) is the probability that the corresponding semantic concept is presented in the input image.

Table 3: Results on BLEU-4 (B-4), METEOR (M) and CIDEr-D (C) metrices compared to other state-of-the-art results and baselines on Youtube2Text.

Table 2: Comparison to published state-of-the-art image captioning models on the blind test set as reported by the COCO test server. SCN-LSTM is our model. ATT refers to ATT VC [47], OV refers to OriolVinyals [41], and MSR Cap refers to MSR Captivator [8].

Figure 6: Detected tags and sentences generation results on COCO. The output captions are generated by: 1) LSTM-R, 2) LSTM-RT2, and 3) our SCN-LSTM.

Figure 4: Detected tags and sentences generation results on COCO. The output captions are generated by: 1) LSTM-R, 2) LSTM-RT2, and 3) our SCN-LSTM.

Figure 3: Illustration of semantic composition. Our model can adjust the caption smoothly as the semantic concepts are modified.

Citations

•Proceedings Article•10.1109/CVPR.2018.00143

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

Tao Xu, +6 more

- 18 Jun 2018

TL;DR: AttnGAN as mentioned in this paper proposes an attentional generative network to synthesize fine-grained details at different sub-regions of the image by paying attentions to the relevant words in the natural language description.

...read moreread less

2.1K

•Proceedings Article•10.1109/ICCV.2019.00473

Attention on Attention for Image Captioning

Lun Huang, +3 more

- 01 Oct 2019

TL;DR: AoANet as mentioned in this paper proposes an Attention on Attention (AoA) module, which extends the conventional attention mechanisms to determine the relevance between attention results and queries and achieves state-of-the-art performance.

...read moreread less

1.1K

•Journal Article•10.1145/3295748

A Comprehensive Survey of Deep Learning for Image Captioning

Md. Zakir Hossain, +3 more

- 04 Feb 2019

- ACM Computing Surveys

TL;DR: A comprehensive review of deep learning-based image captioning techniques can be found in this article, where the authors discuss the foundation of the techniques to analyze their performances, strengths, and limitations.

...read moreread less

934

Journal Article•10.1109/TMM.2017.2729019

Video Captioning With Attention-Based LSTM and Semantic Consistency

Lianli Gao, +4 more

- 19 Jul 2017

- IEEE Transactions on Multimedia

TL;DR: A novel end-to-end framework named aLSTMs, an attention-based LSTM model with semantic consistency, to transfer videos to natural sentences with competitive or even better results than the state-of-the-art baselines for video captioning in both BLEU and METEOR.

...read moreread less

729

•Proceedings Article•10.1109/CVPR.2018.00943

TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-Rays

Xiaosong Wang, +4 more

- 18 Jun 2018

TL;DR: A novel Text-Image Embedding network (TieNet) is proposed for extracting the distinctive image and text representations of chest X-rays and multi-level attention models are integrated into an end-to-end trainable CNN-RNN architecture for highlighting the meaningful text words and image regions.

...read moreread less

575

...

Expand

References

•Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

138.5K

•Posted Content

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 10 Dec 2015

- arXiv: Computer Vision and Pattern Recog...

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

...read moreread less

117.9K

Journal Article•10.1162/NECO.1997.9.8.1735

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997

- Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

99K

•Posted Content

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 22 Dec 2014

- arXiv: Learning

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.

...read moreread less

82.5K

•Book Chapter•10.1007/978-3-319-10602-1_48

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, +7 more

- 06 Sep 2014

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.

...read moreread less

51.7K

...

Expand

Semantic Compositional Networks for Visual Captioning

Chat with Paper

AI Agents for this Paper

Figures

Citations

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

Attention on Attention for Image Captioning

A Comprehensive Survey of Deep Learning for Image Captioning

Video Captioning With Attention-Based LSTM and Semantic Consistency

TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-Rays

References

Adam: A Method for Stochastic Optimization

Deep Residual Learning for Image Recognition

Long short-term memory

Adam: A Method for Stochastic Optimization

Microsoft COCO: Common Objects in Context

Related Papers (5)

Bleu: a Method for Automatic Evaluation of Machine Translation

Show and tell: A neural image caption generator

CIDEr: Consensus-based image description evaluation

Meteor Universal: Language Specific Translation Evaluation for Any Target Language

Collecting Highly Parallel Data for Paraphrase Evaluation