Nonparametric Method for Data-driven Image Captioning
Rebecca Mason,Eugene Charniak +1 more
- 01 Jun 2014
- pp 592-598
TL;DR: This work addresses the challenge of noisy estimations of visual content and poor alignment between images and human-written captions by estimating a word frequency representation of the visual content of a query image to cast caption generation as an extractive summarization problem.
read more
Abstract: We present a nonparametric density estimation technique for image caption generation. Data-driven matching methods have shown to be effective for a variety of complex problems in Computer Vision. These methods reduce an inference problem for an unknown image to finding an existing labeled image which is semantically similar. However, related approaches for image caption generation (Ordonez et al., 2011; Kuznetsova et al., 2012) are hampered by noisy estimations of visual content and poor alignment between images and human-written captions. Our work addresses this challenge by estimating a word frequency representation of the visual content of a query image. This allows us to cast caption generation as an extractive summarization problem. Our model strongly outperforms two state-ofthe-art caption extraction systems according to human judgments of caption relevance.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Pattern Recognition and Machine Learning
Christopher M. Bishop
- 01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
10.1K
Multimodal Machine Learning: A Survey and Taxonomy
TL;DR: This paper surveys the recent advances in multimodal machine learning itself and presents them in a common taxonomy to enable researchers to better understand the state of the field and identify directions for future research.
3.4K
•Posted Content
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen,Hao Fang,Tsung-Yi Lin,Ramakrishna Vedantam,Saurabh Gupta,Piotr Dollár,C. Lawrence Zitnick +6 more
TL;DR: The Microsoft COCO Caption dataset and evaluation server are described and several popular metrics, including BLEU, METEOR, ROUGE and CIDEr are used to score candidate captions.
Deep correlation for matching images and text
Fei Yan,Krystian Mikolajczyk +1 more
- 07 Jun 2015
TL;DR: This paper addresses the problem of matching images and captions in a joint latent space learnt with deep canonical correlation analysis (DCCA) by a GPU implementation and proposes methods to deal with overfitting.
Guiding the Long-Short Term Memory Model for Image Caption Generation
Xu Jia,Efstratios Gavves,Basura Fernando,Tinne Tuytelaars +3 more
- 07 Dec 2015
TL;DR: In this article, an extension of the LSTM model is proposed to add semantic information extracted from the image as extra input to each unit, with the aim of guiding the model towards solutions that are more tightly coupled to the image content.
498
References
Pattern Recognition and Machine Learning
TL;DR: This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.
30.8K
Bleu: a Method for Automatic Evaluation of Machine Translation
Kishore Papineni,Salim Roukos,Todd Ward,Wei-Jing Zhu +3 more
- 06 Jul 2002
TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
•Book
Pattern Recognition and Machine Learning
Christopher M. Bishop
- 17 Aug 2006
TL;DR: Probability Distributions, linear models for Regression, Linear Models for Classification, Neural Networks, Graphical Models, Mixture Models and EM, Sampling Methods, Continuous Latent Variables, Sequential Data are studied.
Pattern Recognition and Machine Learning
Christopher M. Bishop
- 01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
10.1K
Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope
Aude Oliva,Antonio Torralba +1 more
TL;DR: The performance of the spatial envelope model shows that specific information about object shape or identity is not a requirement for scene categorization and that modeling a holistic representation of the scene informs about its probable semantic category.
7.5K
Related Papers (5)
Oriol Vinyals,Alexander Toshev,Samy Bengio,Dumitru Erhan +3 more
- 07 Jun 2015
Andrej Karpathy,Li Fei-Fei +1 more
- 07 Jun 2015