Open AccessProceedings Article
A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input
Mateusz Malinowski,Mario Fritz +1 more
- 08 Dec 2014
- Vol. 27, pp 1682-1690
TL;DR: In this paper, the authors combine discrete reasoning with uncertain predictions by a multi-world approach that represents uncertainty about the perceived world in a bayesian framework, which can handle human questions of high complexity about realistic scenes and replies with range of answer like counts, object classes, instances and lists of them.
read more
Abstract: We propose a method for automatically answering questions about images by bringing together recent advances from natural language processing and computer vision. We combine discrete reasoning with uncertain predictions by a multi-world approach that represents uncertainty about the perceived world in a bayesian framework. Our approach can handle human questions of high complexity about realistic scenes and replies with range of answer like counts, object classes, instances and lists of them. The system is directly trained from question-answer pairs. We establish a first benchmark for this task that can be seen as a modern attempt at a visual turing test.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Unifying the Video and Question Attentions for Open-Ended Video Question Answering
Hongyang Xue,Zhou Zhao,Deng Cai +2 more
TL;DR: This paper proposes a data set for open-ended Video-QA with the automatic question generation approaches, and proposes their sequential video attention and temporal question attention models, which are integrated into the model of unified attention.
Visual Question Answering: A Tutorial
Damien Teney,Qi Wu,Anton van den Hengel +2 more
TL;DR: VQA constitutes a test for deep visual understanding and a benchmark for general artificial intelligence (AI) and while the field of VQA has seen recent successes, it remains a largely unsolved task.
VEglue: Testing Visual Entailment Systems via Object-Aligned Joint Erasing
Zhiyuan Chang,Mingyang Li,Junjie Wang,Cheng Li,Qing Wang +4 more
TL;DR: VEglue is a novel object-aligned joint erasing approach for testing VE systems that effectively detects issues and improves model performance.
•Posted Content
MUREL: Multimodal Relational Reasoning for Visual Question Answering
TL;DR: MuRel as mentioned in this paper introduces an atomic reasoning primitive representing interactions between question and image regions by a rich vectorial representation, and modeling region relations with pairwise combinations, which progressively refines visual and question interactions, and can be used to define visualization schemes finer than mere attention maps.
Multi-Turn Video Question Answering via Hierarchical Attention Context Reinforced Networks
TL;DR: This paper proposes the hierarchical attention context network for context-aware question understanding by modeling the hierarchically sequential conversation context structure and develops the reinforced decoder network to generate the open-ended natural language answer for multi-turn video question answering.
References
•Book
Fuzzy sets
Lotfi A. Zadeh
- 01 Aug 1996
TL;DR: A separation theorem for convex fuzzy sets is proved without requiring that the fuzzy sets be disjoint.
53.2K
WordNet: a lexical database for English
TL;DR: WordNet1 provides a more effective combination of traditional lexicographic information and modern computing, and is an online lexical database designed for use under program control.
16.9K
•Book
Introduction to Information Retrieval
Christopher D. Manning,Prabhakar Raghavan,Hinrich Schütze +2 more
- 01 Jan 2008
TL;DR: In this article, the authors present an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections.
Indoor segmentation and support inference from RGBD images
Nathan Silberman,Derek Hoiem,Pushmeet Kohli,Rob Fergus +3 more
- 07 Oct 2012
TL;DR: The goal is to parse typical, often messy, indoor scenes into floor, walls, supporting surfaces, and object regions, and to recover support relationships, to better understand how 3D cues can best inform a structured 3D interpretation.
Related Papers (5)
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016
Jeffrey Pennington,Richard Socher,Christopher D. Manning +2 more
- 01 Oct 2014