Open AccessPosted Content
Unshuffling Data for Improved Generalization.
TL;DR: This work describes a training procedure to capture the patterns that are stable across environments while discarding spurious ones, and demonstrates multiple use cases with the task of visual question answering, which is notorious for dataset biases.
read more
Abstract: Generalization beyond the training distribution is a core challenge in machine learning. The common practice of mixing and shuffling examples when training neural networks may not be optimal in this regard. We show that partitioning the data into well-chosen, non-i.i.d. subsets treated as multiple training environments can guide the learning of models with better out-of-distribution generalization. We describe a training procedure to capture the patterns that are stable across environments while discarding spurious ones. The method makes a step beyond correlation-based learning: the choice of the partitioning allows injecting information about the task that cannot be otherwise recovered from the joint distribution of the training data. We demonstrate multiple use cases with the task of visual question answering, which is notorious for dataset biases. We obtain significant improvements on VQA-CP, using environments built from prior knowledge, existing meta data, or unsupervised clustering. We also get improvements on GQA using annotations of "equivalent questions", and on multi-dataset training (VQA v2 / Visual Genome) by treating them as distinct environments.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Posted Content
In Search of Lost Domain Generalization
Ishaan Gulrajani,David Lopez-Paz +1 more
TL;DR: This paper implements DomainBed, a testbed for domain generalization including seven multi-domain datasets, nine baseline algorithms, and three model selection criteria, and finds that, when carefully implemented, empirical risk minimization shows state-of-the-art performance across all datasets.
775
Counterfactual VQA: A Cause-Effect Look at Language Bias
Yulei Niu,Kaihua Tang,Hanwang Zhang,Zhiwu Lu,Xian-Sheng Hua,Ji-Rong Wen +5 more
- 01 Jun 2021
TL;DR: The authors proposed a counterfactual inference framework to mitigate language bias in VQA models, which enables them to capture the language bias as the direct causal effect of questions on answers and reduce language bias by subtracting the direct language effect from the total causal effect.
•Posted Content
Counterfactual VQA: A Cause-Effect Look at Language Bias
TL;DR: A novel counterfactual inference framework is proposed, which enables the language bias to be captured as the direct causal effect of questions on answers and reduced by subtracting the direct language effect from the total causal effect.
301
Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations
Polina Kirichenko,Pavel Izmailov,Andrew Gordon Wilson +2 more
- 06 Apr 2022
TL;DR: It is demonstrated that simple last layer retraining on large ImageNet-trained models can match or outperform state-of-the-art approaches on spurious correlation benchmarks, but with profoundly lower complexity and computational expenses.
188
Counterfactual Vision and Language Learning
Ehsan Abbasnejad,Damien Teney,Amin Parvaneh,Javen Shi,Anton van den Hengel +4 more
- 14 Jun 2020
TL;DR: This work proposes a method that addresses the problem of visual question answering by introducing counterfactuals in the training, and shows that simulating plausible alternative training data through this process results in better generalization.
References
ImageNet classification with deep convolutional neural networks
TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.
•Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton +2 more
- 03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Glove: Global Vectors for Word Representation
Jeffrey Pennington,Richard Socher,Christopher D. Manning +2 more
- 01 Oct 2014
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Statistical learning theory
Vladimir Vapnik
- 01 Jan 1998
TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
30.4K
Bagging predictors
Leo Breiman
- 01 Aug 1996
TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.