Journal Article
Improving Systematic Generalization Through Modularity and Augmentation
Laura Ruis,Brenden M. Lake +1 more
TL;DR: This work investigates how two well-known modeling principles— modularity and data augmentation—affect systematic generalization of neural networks in grounded language learning and analyzes how large the vocabulary needs to be to achieve system- atic generalization and how similar the augmented data needs toBe to the problem at hand.
read more
Abstract: Systematic generalization is the ability to combine known parts into novel meaning; an important aspect of efficient human learning, but a weakness of neural network learning. In this work, we investigate how two well-known modeling principles -- modularity and data augmentation -- affect systematic generalization of neural networks in grounded language learning. We analyze how large the vocabulary needs to be to achieve systematic generalization and how similar the augmented data needs to be to the problem at hand. Our findings show that even in the controlled setting of a synthetic benchmark, achieving systematic generalization remains very difficult. After training on an augmented dataset with almost forty times more adverbs than the original problem, a non-modular baseline is not able to systematically generalize to a novel combination of a known verb and adverb. When separating the task into cognitive processes like perception and navigation, a modular neural network is able to utilize the augmented data and generalize more systematically, achieving 70% and 40% exact match increase over state-of-the-art on two gSCAN tests that have not previously been improved. We hope that this work gives insight into the drivers of systematic generalization, and what we still need to improve for neural networks to learn more like humans do.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Figure 1: This figure depicts the two tests of adverb compositionality in gSCAN. Figure (a) denotes a few-shot learning test; a model has access to few (k) examples of how the adverb “cautiously” translates to an output sequence and needs to generalize to all other examples. Figure (b) denotes the “pull while spinning”-test; reminiscent of the “cycle cautiously”-example, a model learns all examples of pushing while spinning or walking while spinning, and is tested on its ability to interpret “pull while spinning”. 
Figure 2: The input command (“Push a circle cautiously.”) and world state are processed by different modules, each dealing with a different question about the input task. The final output is produced by the transformation module. *: cautious is in reality not a primitive action but a sequence of “turn left turn right turn right turn left”
Citations
Instilling Inductive Biases with Subnetworks
Enyan Zhang,Michael A. Lepori,Ellie Pavlick +2 more
TL;DR: This work discovers a functional subnetwork that implements a particular subtask within a trained model and uses it to instill inductive biases towards solutions utilizing that subtask, and demonstrates its effectiveness with two experiments.
Learn to Compose Syntactic and Semantic Representations Appropriately for Compositional Generalization
TL;DR: The authors propose Composition (Compose Syntactic and Semantic Representations), an extension to Seq2Seq models to learn to compose representations of different encoder layers appropriately for generating different keys and values passing into different decoder layers through introducing a composed layer between the encoder and decoder.
Meta-learning from relevant demonstrations can improve compositional generalization
TL;DR: The proposed architecture can significantly improve the generalization capabilities of the agent on one of the most difficult gSCAN splits: the “adverb-to-verb” Split H.
Meta-learning from relevant demonstrations improves compositional generalization
Sam Spilsbury,Alexander Ilin +1 more
TL;DR: In this article , a meta-sequence-to-sequence learning approach is proposed to improve the generalization of language-instructed agents in gSCAN, where the agent receives as a context a few examples of pairs of instructions and action trajectories in a given instance of the environment (a support set) and it is tasked to predict an action sequence for a query instruction for the same environment instance.
Improved Compositional Generalization by Generating Demonstrations for Meta-Learning
Sam Spilsbury,Alexander Ilin +1 more
TL;DR: The authors consider a grounded language learning problem (gSCAN) where good support examples for certain test splits might not even exist in the training data, or would be infeasible to search for.
References
•Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
- 01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
138.5K
A survey on Image Data Augmentation for Deep Learning
TL;DR: This survey will present existing methods for Data Augmentation, promising developments, and meta-level decisions for implementing DataAugmentation, a data-space solution to the problem of limited data.
The modularity of mind
Robert Cummins,Jerry A. Fodor +1 more
Abstract: This monograph synthesizes current information from the various fields of cognitive science in support of a new theory of mind. Most psychologists study horizontal processes like memory. Fodor postulates a vertical and modular psychological organization underlying biologically coherent behaviours. This view of mental architecture is consistent with the historical tradition of faculty psychology while integrating a computational approach to mental processes. One of the most notable aspects of Fodor’s work is that it articulates features not only of speculative cognitive architecture but also of current research in artificial intelligence. – Part I. Four accounts of mental structure; – Part II. A functional taxonomy of cognitive mechanisms; – Part III. Input systems as modules; – Part IV. Central systems; – Part V. Caveats and conclusions. M.-M. V.
7.6K
Connectionism and cognitive architecture: a critical analysis
Jerry A. Fodor,Zenon W. Pylyshyn +1 more
TL;DR: Differences between Connectionist proposals for cognitive architecture and the sorts of models that have traditionally been assumed in cognitive science are explored and the possibility that Connectionism may provide an account of the neural structures in which Classical cognitive architecture is implemented is considered.
3.9K
•Posted Content
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Justin Johnson,Bharath Hariharan,Laurens van der Maaten,Li Fei-Fei,C. Lawrence Zitnick,Ross Girshick +5 more
TL;DR: This work presents a diagnostic dataset that tests a range of visual reasoning abilities and uses this dataset to analyze a variety of modern visual reasoning systems, providing novel insights into their abilities and limitations.
1.7K