TL;DR: The analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.
Abstract: We examine the role of memorization in deep learning, drawing connections to capacity, generalization, and adversarial robustness. While deep networks are capable of memorizing noise data, our results suggest that they tend to prioritize learning simple patterns first. In our experiments, we expose qualitative differences in gradient-based optimization of deep neural networks (DNNs) on noise vs. real data. We also demonstrate that for appropriately tuned explicit regularization (e.g., dropout) we can degrade DNN training performance on noise datasets without compromising generalization on real data. Our analysis suggests that the notions of effective capacity which are dataset independent are unlikely to explain the generalization performance of deep networks when trained with gradient based methods because training data itself plays an important role in determining the degree of memorization.
TL;DR: This paper describes a testing methodology for quantitatively assessing the risk that rare or unique training-data sequences are unintentionally memorized by generative sequence models---a common type of machine-learning model, and describes new, efficient procedures that can extract unique, secret sequences, such as credit card numbers.
Abstract: This paper describes a testing methodology for quantitatively assessing the risk that rare or unique training-data sequences are unintentionally memorized by generative sequence models---a common type of machine-learning model. Because such models are sometimes trained on sensitive data (e.g., the text of users' private messages), this methodology can benefit privacy by allowing deep-learning practitioners to select means of training that minimize such memorization.
In experiments, we show that unintended memorization is a persistent, hard-to-avoid issue that can have serious consequences. Specifically, for models trained without consideration of memorization, we describe new, efficient procedures that can extract unique, secret sequences, such as credit card numbers. We show that our testing strategy is a practical and easy-to-use first line of defense, e.g., by describing its application to quantitatively limit data exposure in Google's Smart Compose, a commercial text-completion neural network trained on millions of users' email messages.
TL;DR: It is argued that the declarative/procedural model provides a new framework for the study of lexicon and grammar.
Abstract: What are the psychological, computational and neural underpinnings of language? Are these neurocognitive correlates dedicated to language? Do different parts of language depend on distinct neurocognitive systems? Here I address these and other issues that are crucial for our understanding of two fundamental language capacities: the memorization of words in the mental lexicon, and the rule-governed combination of words by the mental grammar. According to the declarative/procedural model, the mental lexicon depends on declarative memory and is rooted in the temporal lobe, whereas the mental grammar involves procedural memory and is rooted in the frontal cortex and basal ganglia. I argue that the declarative/procedural model provides a new framework for the study of lexicon and grammar.
TL;DR: In this paper, Rote versus meaningful learning is discussed in the context of Bloom's taxonomy and its application in theory-into-practice (T2P) setting.
Abstract: (2002). Rote Versus Meaningful Learning. Theory Into Practice: Vol. 41, Revising Bloom's Taxonomy, pp. 226-232.
TL;DR: Successful performance at complex thinking may rely on limited regulatory resources, and depletion of the self's regulatory resources was manipulated by having some participants initially regulate attention or emotion.
Abstract: Some complex thinking requires active guidance by the self, but simpler mental activities do not. Depletion of the self's regulatory resources should therefore impair the former and not the latter. Resource depletion was manipulated by having some participants initially regulate attention (Studies 1 and 3) or emotion (Study 2). As compared with no-regulation participants who did not perform such exercises, depleted participants performed worse at logic and reasoning (Study 1), cognitive extrapolation (Study 2), and a test of thoughtful reading comprehension (Study 3). The same manipulations failed to cause decrements on a test of general knowledge (Study 2) or on memorization and recall of nonsense syllables (Study 3). Successful performance at complex thinking may therefore rely on limited regulatory resources.