A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input

Open AccessProceedings Article

A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input

- 08 Dec 2014

- Vol. 27, pp 1682-1690

555

TL;DR: In this paper, the authors combine discrete reasoning with uncertain predictions by a multi-world approach that represents uncertainty about the perceived world in a bayesian framework, which can handle human questions of high complexity about realistic scenes and replies with range of answer like counts, object classes, instances and lists of them.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1109/TIP.2017.2746267

Unifying the Video and Question Attentions for Open-Ended Video Question Answering

Hongyang Xue, +2 more

- 29 Aug 2017

- IEEE Transactions on Image Processing

TL;DR: This paper proposes a data set for open-ended Video-QA with the automatic question generation approaches, and proposes their sequential video attention and temporal question attention models, which are integrated into the model of unified attention.

...read moreread less

Journal Article•10.1109/msp.2017.2739826

Visual Question Answering: A Tutorial

Damien Teney, +2 more

- 09 Nov 2017

- IEEE Signal Processing Magazine

TL;DR: VQA constitutes a test for deep visual understanding and a benchmark for general artificial intelligence (AI) and while the field of VQA has seen recent successes, it remains a largely unsolved task.

...read moreread less

Journal Article•10.48550/arxiv.2403.02581

VEglue: Testing Visual Entailment Systems via Object-Aligned Joint Erasing

Zhiyuan Chang, +4 more

- 05 Mar 2024

- arXiv.org

TL;DR: VEglue is a novel object-aligned joint erasing approach for testing VE systems that effectively detects issues and improves model performance.

...read moreread less

•Posted Content

MUREL: Multimodal Relational Reasoning for Visual Question Answering

Remi Cadene, +3 more

- 25 Feb 2019

- arXiv: Computer Vision and Pattern Recog...

TL;DR: MuRel as mentioned in this paper introduces an atomic reasoning primitive representing interactions between question and image regions by a rich vectorial representation, and modeling region relations with pairwise combinations, which progressively refines visual and question interactions, and can be used to define visualization schemes finer than mere attention maps.

...read moreread less

Journal Article•10.1109/TIP.2019.2902106

Multi-Turn Video Question Answering via Hierarchical Attention Context Reinforced Networks

Zhou Zhao, +3 more

- 27 Feb 2019

- IEEE Transactions on Image Processing

TL;DR: This paper proposes the hierarchical attention context network for context-aware question understanding by modeling the hierarchically sequential conversation context structure and develops the reinforced decoder network to generate the open-ended natural language answer for multi-turn video question answering.

...read moreread less

...

Expand

References

•Book

Fuzzy sets

Lotfi A. Zadeh

- 01 Aug 1996

TL;DR: A separation theorem for convex fuzzy sets is proved without requiring that the fuzzy sets be disjoint.

...read moreread less

53.2K

Journal Article•10.1145/219717.219748

WordNet: a lexical database for English

George A. Miller

- 01 Nov 1995

- Communications of The ACM

TL;DR: WordNet1 provides a more effective combination of traditional lexicographic information and modern computing, and is an online lexical database designed for use under program control.

...read moreread less

16.9K

•Book

Introduction to Information Retrieval

Christopher D. Manning, +2 more

- 01 Jan 2008

TL;DR: In this article, the authors present an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections.

...read moreread less

13.1K

Journal Article•10.2307/2529486

Exploratory data analysis

F. N. David, +1 more

- 01 Dec 1977

- Biometrics

12.6K

Book Chapter•10.1007/978-3-642-33715-4_54

Indoor segmentation and support inference from RGBD images

Nathan Silberman, +3 more

- 07 Oct 2012

TL;DR: The goal is to parse typical, often messy, indoor scenes into floor, walls, supporting surfaces, and object regions, and to recover support relationships, to better understand how 3D cues can best inform a structured 3D interpretation.

...read moreread less

6.8K

...

Expand

A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input

Chat with Paper

AI Agents for this Paper

Citations

Unifying the Video and Question Attentions for Open-Ended Video Question Answering

Visual Question Answering: A Tutorial

VEglue: Testing Visual Entailment Systems via Object-Aligned Joint Erasing

MUREL: Multimodal Relational Reasoning for Visual Question Answering

Multi-Turn Video Question Answering via Hierarchical Attention Context Reinforced Networks

References

Fuzzy sets

WordNet: a lexical database for English

Introduction to Information Retrieval

Exploratory data analysis

Indoor segmentation and support inference from RGBD images

Related Papers (5)

VQA: Visual Question Answering

Microsoft COCO: Common Objects in Context

Long short-term memory

Deep Residual Learning for Image Recognition

Glove: Global Vectors for Word Representation