Interpretable Counting for Visual Question Answering

Open AccessPosted Content

Interpretable Counting for Visual Question Answering

- 23 Dec 2017

27

TL;DR: The model sequentially selects from detected objects and learns interactions between objects that influence subsequent selections and outperforms the state of the art architecture for VQA on multiple metrics that evaluate counting.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.1109/CVPR42600.2020.01028

In Defense of Grid Features for Visual Question Answering

Huaizu Jiang, +4 more

- 14 Jun 2020

TL;DR: This paper revisits grid features for VQA, and finds they can work surprisingly well -- running more than an order of magnitude faster with the same accuracy (e.g. if pre-trained in a similar fashion).

...read moreread less

474

•Journal Article•10.1609/AAAI.V33I01.33018876

KVQA: Knowledge-Aware Visual Question Answering

Sanket Shah, +3 more

- 17 Jul 2019

TL;DR: KVQA is introduced – the first dataset for the task of (world) knowledge-aware VQA and is the largest dataset for exploring V QA over large Knowledge Graphs (KG), which consists of 183K question-answer pairs involving more than 18K named entities and 24K images.

...read moreread less

196

Proceedings Article•10.1109/CVPR42600.2020.01459

Hypergraph Attention Networks for Multimodal Learning

Eun-Sol Kim, +4 more

- 14 Jun 2020

TL;DR: From the qualitative analysis with two Visual Question and Answering datasets, it is discovered that the alignment of the information levels between the modalities is important, and the symbolic graphs are very powerful ways to represent the information of the low-level signals in alignment.

...read moreread less

102

•Journal Article•10.1109/TPAMI.2019.2943456

Interpretable Visual Question Answering by Reasoning on Dependency Trees

Qingxing Cao, +3 more

- 01 Mar 2021

- IEEE Transactions on Pattern Analysis an...

TL;DR: A novel neural network model that performs global reasoning on a dependency tree parsed from the question and is capable of building an interpretable visual question answering (VQA) system that gradually derives image cues following question-driven parse-tree reasoning.

...read moreread less

69

•Journal Article•10.1016/j.inffus.2021.07.009

Multimodal research in vision and language: A review of current and emerging trends

01 Jan 2022

- Information Fusion

TL;DR: A detailed overview of the latest trends in research pertaining to visual and language modalities is presented in this paper , where the authors look at their applications in their task formulations and how to solve various problems related to semantic perception and content generation.

...read moreread less

61

...

Expand

References

•Posted Content

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 22 Dec 2014

- arXiv: Learning

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.

...read moreread less

82.5K

Proceedings Article•10.3115/V1/D14-1162

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

- 01 Oct 2014

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

41.6K

•Proceedings Article•10.1109/CVPR.2014.81

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Ross Girshick, +3 more

- 23 Jun 2014

TL;DR: RCNN as discussed by the authors combines CNNs with bottom-up region proposals to localize and segment objects, and when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.

...read moreread less

33.7K

•Posted Content

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, +3 more

- 04 Jun 2015

- arXiv: Computer Vision and Pattern Recog...

TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.

...read moreread less

25.3K

•Posted Content

Fast R-CNN

Ross Girshick

- 30 Apr 2015

- arXiv: Computer Vision and Pattern Recog...

TL;DR: This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection that builds on previous work to efficiently classify object proposals using deep convolutional networks.

...read moreread less

20.3K

...

Expand

Interpretable Counting for Visual Question Answering

Chat with Paper

AI Agents for this Paper

Citations

In Defense of Grid Features for Visual Question Answering

KVQA: Knowledge-Aware Visual Question Answering

Hypergraph Attention Networks for Multimodal Learning

Interpretable Visual Question Answering by Reasoning on Dependency Trees

Multimodal research in vision and language: A review of current and emerging trends

References

Adam: A Method for Stochastic Optimization

Glove: Global Vectors for Word Representation

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Fast R-CNN

Related Papers (5)

Interpretable Counting for Visual Question Answering

An Improved Attention and Hybrid Optimization Technique for Visual Question Answering

DualNet: Domain-invariant network for visual question answering

Exploiting hierarchical visual features for visual question answering

Visual Question Answering with Question Representation Update (QRU)