TL;DR: In this paper, a learnability paradox between argument structure and the lexicon is resolved by using argument structure as a pointer between syntactic structure and propositions, and the autonomy of semantic representation implications for the semantic bootstrapping hyposthesis conservatism, listedness and lexicon spatial schemas and abstract thought.
Abstract: Part 1 A learnability paradox: argument structure and the lexicon the logical problem of language acquisition Baker's paradox attempted solutions to Baker's paradox. Part 2 Constraints on lexical rules: morphological and phonological constraints semantic constraints how semantic and morphological constraints might resolve Baker's paradox evidence for criteria-governed productivity problems for the criteria-governed productivity theory. Part 3 Constraints and the nature of argument structure: overview - why lexical rules carry semantic constraints constraints of lexical rules as manifestations of more general phenomena a theory of argument structure on universality. Part 4 Possible and actual forms: the problem of negative exceptions transitive action verbs as evidence for narrow subclasses the nature of narrow conflation classes defining and motivating subclasses of verbs licensing the four alterations the relation between narrow-range and broad-range rules. Part 5 Representation: the need for a theory of lexicosemantic representation is a theory of lexical semantics feasible? evidence for a semantic subsystem underlying verb meanings a cross-linguistic inventory of components of verb meaning a theory of the representation of grammatically relevant semantic structures explicit representations of lexical rules an lexicosemantic structures summary. Part 6 Learning: linking rules lexical semantic structures broad conflation classes (thematic cores) and broad range lexical rules summary of learning mechanisms. Part 7 Development: developmental sequence for argument structure alterations the unlearning problem children's argument structure changing rules are always semantically conditioned do children's errors have the same cause as adults? acquisition of verb meaning and errors in argument structure some predictions about the acquisition of narrow-range rules summary of development. Part 8 Conclusions: a brief summary of the resolution of the paradox argument structure as a pointer between syntactic structure and propositions the autonomy of semantic representation implications for the semantic bootstrapping hyposthesis conservatism, listedness and the lexicon spatial schemas and abstract thought.
TL;DR: This article argued that the grammar-building process cannot make use of negative evidence to restructure (interlanguage) grammars, and that negative evidence cannot be used in non-native language acquisition.
Abstract: This paper reassesses the role of Negative Evidence (NE) in nonnative language acquisition. We argue that the grammar-building process cannot make use of NE to restructure (Interlanguage) grammars ...
TL;DR: A general framework whereby the notion of identifiability is given a precise formal definition similar to that of learnability is proposed, and if a tractable procedure exists for deciding whether a given relation is decomposable into a constraint network or a CNF theory with desirable topology is explored.
TL;DR: L'article a pour but l'explication des processus de la selection naturelle, l'acquisition and l'evolution de the langue a l'aide of the technique de l'algorithme genetique.
Abstract: Darwin’s theory of natural selection had an important influence on the Neogrammarians. Like Darwin, they believed that diachronic change was the result of selective pressures on organisms from the environment operating on random variation within a population. Learning theory must provide an account of how the learner’s search through the set of possible combinations of parameter values takes place, and of how certain values are chosen over others. Genetic algorithms mimic natural selection by representing hypotheses about a problem in a way that is similar to the way in which genetic material is represented. Hypotheses are then tested against the problem space, with the most fit hypotheses contributing to the formation of new hypotheses via reproduction. Clark provides a crude definition of improvement based on the ability to parse input sentences in terms of failed parses. A shifted system of parameter settings can be thought of as a marked system.
TL;DR: The challenge in this field is to develop a theory that accounts for the learnability problems involved in the domain at hand, that makes correct empirical predictions, and leaves room for variation between children exactly in the domains where variation occurs.
Abstract: What is innate and what is learned is one of the questions underlying most current work in first language acquisition. The challenge in this field is to develop a theory that accounts for the learnability problems involved in the domain at hand, that makes correct empirical predictions, and leaves room for variation between children exactly in the domains where variation occurs.
TL;DR: It is concluded that island constraints are not fully innate; at least some children have to learn at leastsome facts about extractability, and the strongest island constraints must be innate, and they must be progressively weakened by learners who encounter constructions that disobey them.
Abstract: Island constraints on extraction are not universal. In Slavic languages they are stronger than in English, and in Scandinavian languages they are weaker. At least this is so for extraction from clausal complements to verbs, which I will focus on in this paper. As a first approximation (inaccurate but adequate for purposes of this section): all complement clauses are islands in Slavic, only WH-clauses are islands in English, and not even WH-clauses are islands in Scandinavian. We can conclude that island constraints are not fully innate; at least some children have to learn at least some facts about extractability. We can also establish, by reference to the Subset Principle, WHICH children have to do the learning.1 It must be the children learning a more generous language like Swedish, rather than those learning a more restricted language like Polish. To determine who learns we consider who has the necessary data to learn from. Given the assumption (standard though not undisputed) that learners have no access to systematic negative input,2 it follows that language-specific facts about islands must be learnable from positive data alone, i.e., by hearing sentences of the language. So it must be the Swedish learners and the English learners who discover from their input that it is possible to extract from complement clauses. The Polish learners (and hence ALL children) must believe innately that complement clauses are islands.3 In general: the strongest island constraints must be innate, and they must be progressively weakened by learners who encounter constructions that disobey them.
TL;DR: A simple description logic is defined, some results on its expressive power are summarized, and its learnability is analyzed; it is shown that the full logic cannot be tractably learned; however, syntactic restrictions that enable tractable learning exist.
Abstract: This paper considers the learnability of subsets of first-order logic. Piror work has established two boundaries of learnability: Haussler [1989] has shown that conjunctions in first-order logic cannot be learned in the Valiant model, even if the form of the conjunction is highly restricted; on the other hand, Valiant [1984] has shown that propositional conjunctions are learnable. In this paper, we study the learnability of the restricted first-order logics known as description logics. Description logics are also subsets of predicate calculus, but are expressed using a different syntax, allowing a different set of syntactic restrictions to be explored. In this paper, we first define a simple description logic, summarize some results on its expressive power, and then analyze its learnability. It is shown that the full logic cannot be tractably learned; however, syntactic restrictions that enable tractable learning exist. The learnability results hold even if the alphabets of primitive classes and roles (over which descriptions are constructed) are infinite; our positive result thus generalizes not only the result of Valiant [1984] on learning monomials to learning concepts in our (conjunctive) first order language, but also the result of Blum [1990] on learning monomials over infinite attribute spaces.
Abstract: (1) Consider the measure space ([0, 1]B([0, 1]), λ), where B([0, 1]) is the restriction of the Borel σalgebra to [0, 1], and λ is the restriction of Lebesgue measure to [0, 1]. Let E1, · · · , Em be a collection of Borel measurable subsets of [0, 1] such that every element x ∈ [0, 1] belongs to at least n sets in the collection {Ej}j=1, where n ≤ m. Show that there exists a j ∈ {1, · · · ,m} such that λ(Ej) ≥ n m . (1.5 pt)
TL;DR: The ideas from inductive reasoning are instantiated in alternative ways, and links are established between the various new constraints both among themselves as well as with other well-known constraints, such as conservativeness.
Abstract: Learnability of families of recursive languages from positive data is studied in the Gold paradigm of inductive inference, where the learner obeys certain constraints motivated by work in inductive reasoning. Previously, various notions of monotonicity have been defined in the context of language learning. These constraints require that the learner's guess monotonically ‘improves’ with regard to the target language. In this paper, the ideas from inductive reasoning are instantiated in alternative ways. Links are established between the various new constraints both among themselves as well as with other well-known constraints, such as conservativeness. Exactly learnable families are characterized for prudent learners which obey various combinations of these constraints. Applications of these characterizations are also shown.
TL;DR: In this article, a series of experiments was conducted with comprehensive school students learning French, aged 11-13, and the authors concluded that the position of items in the list is not a reliable indicator of learnability, while the presence of an English word embedded in a French word is promising indicators of word learnability.
Abstract: The learning of second language vocabulary in lists of word-pairs is a widespread practice
despite the disapproval of many in the second language learning domain. There is an
acknowledged mismatch between psychological theories on the one hand and techniques of
vocabulary learning on the other. Psychology does not address the relevant issues directly
and second language learning practice is often atheoretical and unprincipled. This thesis
reviews aspects of psychology which appear to be relevant to second language vocabulary
learning and their applicability. A series of experiments is conducted with comprehensive
school students learning French, aged 11-13.
The first part of the study deals with the presentation of vocabulary items to be
learned. Presenting items in the order First Language - Second Language is the more
versatile form of presentation if both generation and comprehension are required on the part
of the learner. The transferability of list learning to testing in a sentential context depends
on the ability of the learner and the task involved. Higher-ability list learners are inhibited
in a generation task but not in a comprehension task; the opposite is true for lower-ability
learners. Learning in a context improves the performance of higher-ability learners in
generation but makes little difference to lower-ability learners. An explanation is suggested
in terms of transfer-appropriate processing. The position of items in the list is not a reliable
indicator of learnability. Primacy, recency, and serial effects may be obtained but none of
them is consistent. The same conclusion applies to different ways of presenting wordpairs.
The second part of the study examines aspects of word learnability. Objective word
frequency is not a reliable indicator of learnability in this context. Word category and the
presence of an English word embedded in a French word are promising indicators of
leamability.
TL;DR: A new measure is defined that is a dual to the VC-dimension, called the testing dimension of a concept class, and it is shown how it yields untestability results for certain concept classes.
Abstract: A model for approximate testing of concepts, which relates to the PAC model of learning, has been developed. In this model an approximate testing algorithm produces a finite set of examples that distinguishes one concept from others that differ from it by more than a given error bound. This model corresponds closely to the helpful teacher learning model. In this paper we examine properties of a concept class that make it testable or untestable. We define a new measure that is a dual to the VC-dimension, called the testing dimension of a concept class, and show how it yields untestability results for certain concept classes.
TL;DR: It is shown that, when exactness is not required, prudence, consistency and responsiveness, even together, do not restrict the power of conservative learners.
Abstract: Language learnability is investigated in the Gold paradigm of inductive inference from positive data. Angluin gave a characterization of learnable families in this framework. Here, learnability of families of recursive languages is studied when the learner obeys certain natural constraints. Exactly learnable families are characterized for prudent learners with the following types of constraints: (0) conservative, (1) conservative and consistent, (2) conservative and responsive, and (3) conservative, consistent, and responsive. The class of learnable families is shown to strictly increase going from (3) to (2) and from (2) to (1), while it stays the same going from (1) to (0). It is also shown that, when exactness is not required, prudence, consistency and responsiveness, even together, do not restrict the power of conservative learners.
TL;DR: This work tries sharpening the intuitive notion of domain specific knowledge by reviewing the alleged difference between processing limitatons due to shartage of resources vs shortages of knowledge, and suggests that a model is language specific if it transparently refer to entities and facts about language as opposed to entities or facts of more general mathematical domains.
Abstract: It is sometimes argued that if PDP networks can be trained to make correct judgements of grammaticality we have an existence proof that there is enough information in the stimulus to permit learning grammar by inductive means alone. This seems inconsistent superficially with Gold's theorem and at a deeper level with the fact that networks are designed on the basis of assumptions about the domain of the function to be learned. To clarify the issue I consider what we should learn from Gold's theorem, then go on to inquire into what it means to say that knowledge is domain specific. I first try sharpening the intuitive notion of domain specific knowledge by reviewing the alleged difference between processing limitatons due to shartage of resources vs shortages of knowledge. After rejecting different formulations of this idea, I suggest that a model is language specific if it transparently refer to entities and facts about language as opposed to entities and facts of more general mathematical domains. This is a useful but not necessary condition. I then suggest that a theory is domain specific if it belongs to a model family which is attuned in a law-like way to domain regularities. This leads to a comparison of PDP and parameter setting models of language learning. I conclude with a novel version of the poverty of stimulus argument.
TL;DR: Saleemi as discussed by the authors argues that the acquisition of language as a cognitive system can properly be understood by pairing the formal approach to learning, often known as learnability theory, with Chomsky's theory of universal grammar and its claim that human language is innately constrained, with some predefined space for variation.
Abstract: Anjum P. Saleemi argues that the acquisition of language as a cognitive system can properly be understood by pairing the formal approach to learning, often known as learnability theory, with Chomsky's theory of Universal Grammar and its claim that human language is innately constrained, with some predefined space for variation. Focusing on specific areas of syntax, such as binding theory and the null subject parameter, Dr Saleemi unites learnability theory's methodology with Chomsky's principles-and-parameters model, and construes acquisition as a function of linguistic principles with largely domain-specific learning procedures, mediated by environmental input. The aim of this study is to show that a self-contained linguistic theory cannot by itself be psychologically plausible, but depends on a compatible theory of learning which embraces developmental as well as formal issues.
TL;DR: A general learning framework which uses random sets is introduced for solving discrete-space classification problems and results show that this random set implementation is computationally competitive with more established methods (which use empirically proven heuristics).
Abstract: A general learning framework which uses random sets is introduced for solving discrete-space classification problems. This framework is based on the pac-learning formalism introduced by Valiant (1984) and generalized in set-theoretic terms by Blumer, et al., (1989). The random set version of this theory is used to develop an algorithm which is a particularly efficient search scheme. This is accomplished by recasting the representational class and constructive proof presented in Valiant (1984) into random set terms and implementing it as an exhaustive search algorithm. The algorithm is a problem-specific incremental (psi) approach in that it satisfies learnability criteria for distribution-specific problems as examples are being sampled. Some theoretical and empirical analyses are presented to demonstrate the convergent pac-learnability and sample complexity of this psi-algorithm. Its performance is then tested on the multiplexor class of problems. This class has been analyzed by others as a benchmark for decision trees and genetic classifiers. Results from these test cases show that, despite using an exhaustive search, this random set implementation is computationally competitive with these more established methods (which use empirically proven heuristics). Conclusions are drawn about potential further improvements in the efficiency of this approach.
TL;DR: A class of genetic algorithms for learning Boolean conjuncts and disjuncts is presented and analyzed in the context of the distribution-free learnability model and the main result provides the number of generations and the size of the population sufficient for the GA to accomplish the learning task.
Abstract: Genetic algorithms are probabilistic search techniques with information processing capabilities similar to the mechanisms of inheritance and adaptation found in biological systems (e.g., populations, biased selection, recombination, mutation).
In this dissertation, a class of genetic algorithms for learning Boolean conjuncts and disjuncts is presented and analyzed in the context of the distribution-free learnability model. In this model, the task of the genetic learner is to identify a concept that with high probability is not too different from the correct (target) concept: The concept must be 'probably approximately correct' (hence 'PAC'). The learner has access to an error-free teacher (oracle) which provides pre-classified training examples upon request. These examples are chosen at random according to a probability distribution on which we do not place any assumptions (hence 'distribution-free'), except that it does not change over time.
The analysis of this genetic learner is complete: Given any reasonable recombination operator and any confidence and accuracy level, the main result provides the number of generations and the size of the population sufficient for the GA to accomplish the learning task. This is the first convergence result of genetic algorithms within the distribution-free learnability model.
Other results in this dissertation include (1) a fundamental tradeoff between the selection operator and the size of the population, (2) an average-case analysis of a special type of genetic PAC learners where the training examples are processed in parallel, and (3) the effects of mutation on the genetic learning process.
TL;DR: This research takes powerful classes of formulas whose learnability is unknown or provably intractable, and then considers restricted cases where the number of different times a single variable may appear in the formula is limited to a small constant.
Abstract: Many learning problems can be phrased in terms of finding a close approximation to some unknown target formula f, based on observing f's value on a sample of points either drawn at random according to some underlying distribution, or perhaps selected by a learner for algorithmic reasons. In this research our goal is to prove theorems about what classes of formulas permit such learning in polynomial time (using the definitions of either Valiant's PAC model or Angluin's exact identification model). In particular we take powerful classes of formulas whose learnability is unknown or provably intractable, and then consider restricted cases where the number of different times a single variable may appear in the formula is limited to a small constant. We prove positive learnability results in several such cases, given either added assumptions on the underlying distribution of random points or the ability of the learner to select some of the sample points. We provide polynomial time learning algorithms for decision trees and monotone disjunctive normal form (DNF) formulas when variables appear at most some arbitrary constant number of times, given that the sample points are chosen uniformly. Over arbitrary distributions, we show algorithms that chose their own sample points, besides using random examples, to closely approximate the same class of decision trees and the class of DNF formulas where variables appear at most twice. For arbitrary formulas, we give a number of algorithms for the read-once case (where variables appear only once) over different bases (the functions computed at formula's nodes). Besides identification algorithms for large classes of boolean read-once formulas, these results include new interpolation algorithms for classes of rational functions, and a membership query algorithm for a new class of neural networks.
TL;DR: Formal Grammar presents recent work in phonology, morphology, semantics, and neurolinguistics, focusing on the relationship between grammatical formalisms and their realizations.
Abstract: Abstract The second volume in the Vancouver Studies in Cognitive Science series, this collection presents recent work in the fields of phonology, morphology, semantics, and neurolinguistics. Its overall theme is the relationship between the contents of grammatical formalisms and their real-time realizations in machine or biological systems. Individual essays address such topics as learnability, implementability, computational issues, parameter setting, and neurolinguistic issues. Contributors include Janet Dean Fodor, Richard T. Oehrle, Bob Carpenter, Edward P. Stabler, Elan Dresher, Arnold Zwicky, Mary-Louis Kean, and Lewis P. Shapiro.
TL;DR: This work considers PAC-learning where the distribution is known to the student and addresses the learnability properties of classes of distributions.
Abstract: We consider PAC-learning where the distribution is known to the student. The problem addressed here is characterizing when learnability with respect to distribution D1 implies learnability with respect to distribution D2.The answer to the above question depends on the learnability model. If the number of examples need not be bounded by a polynomial, it is sufficient to require that all sets which have zero probability with respect to D2 have zero probability with respect to d1. If the number of examples is required to be polynomial, then the probability with respect to D2 must be bounded by a multiplicative constant from that of D1. More stringent conditions must hold if we insist that every hypothesis consistent with the examples be close to the target.Finally, we address the learnability properties of classes of distributions.
TL;DR: The boundedness of A′-movement in American Sign Language (ASL) is discussed, and some of the psycholinguistic questions regarding the learnability and acquisition of these constructions are raised.
Abstract: There are many psycholinguistic questions concerning the so-called ‘Island Constraints’ which restrict the range of A′-movement across languages. In this paper, I will discuss the boundedness of A′-movement in American Sign Language (ASL), and raise some of the psycholinguistic questions regarding the learnability and acquisition of these constructions. ASL is the visual-gestural language used by deaf people in the United States and parts of Canada. A′-movement in ASL is found, but it is bounded even more than A′-movement in English, as will be shown. However, ASL also allows null and overt resumptive pronouns to save potential island violations, so that many structures which might appear to be such violations are actually grammatical. In the first section of this paper, the facts of the boundedness of A′-movement in ASL will be discussed.
TL;DR: The authors compare the performances of a variety of algorithms in a reinforcement learning paradigm, including Ar-p, Ar-i, reinforcement-comparison (plus a new variation), and backpropagation of reinforcement gradient through a forward model to measure learnability, training time, and scaling.
Abstract: The authors compare the performances of a variety of algorithms in a reinforcement learning paradigm, including Ar-p, Ar-i, reinforcement-comparison (plus a new variation), and backpropagation of reinforcement gradient through a forward model. The task domain is discrete multioutput functions. Performance is measured in terms of learnability, training time, and scaling. Ar-p outperforms all others and scales well relative to supervised backpropagation. An ergodic variant of reinforcement-comparison approaches Ar-p performance. For the tasks studied, total training time (including model and controller) for the forward model algorithm is 1 to 2 orders of magnitude more costly than for Ar-p, and the controller's success is sensitive to forward model accuracy. Distortions of the reinforcement gradient predicted by an inaccurate forward model cause the controller's failures. >
TL;DR: This paper will cover those models of learnability where ideas from Vapnik-Chervonenkis combinatorics have had the greatest impact.
Abstract: This paper surveys several models of learnability proposed and investigated by computational learning theorists during the past few years. Computational learning theory is the study of learning as seen from a computational complexity point of view. In addition to the usual space and time complexity, computational learning theory studies the sample complexity, the number of examples seen by the learner. (In a statistical setting, this is known as the sample size.) This paper will cover those models of learnability where ideas from Vapnik-Chervonenkis combinatorics have had the greatest impact. There are a few short proofs to give a flavor of some of the ideas involved, but most of the proofs are too long to be included here. The focus is on giving an idea of the variety of models and the relationships between them. For more complete surveys of computational learning theory see (1988), (1990), (1991), (1992), or the proceedings of the annual Workshop on Computational Learning Theory published by Morgan Kaufmann. Some attempt has been made to keep the notation consistent within this paper, which means that it will be inconsistent with a large subset of the references.
TL;DR: The authors human beings may not be the most admirable species on the planet, or the most likely to survive for another millennium, but they are without any doubt at all the most intelligent.
Abstract: We human beings may not be the most admirable species on the planet, or the most likely to survive for another millennium, but we are without any doubt at all the most intelligent. We are also the only species with language. What is the relation between these two obvious facts?
TL;DR: In this paper, the relation between task action grammar (TAG) and the method of semantic components, discusses the general architecture of TAG, and addresses the question of defining the TAG concept "simple task".
Abstract: Examines the relation between task action grammar (TAG) and the method of semantic components, discusses the general architecture of TAG, and addresses the question of defining the TAG concept "simple task." Developed by S. J. Payne and T. R. Green (1989), TAG is a method aimed to capture an aspect of the learnability of interface languages: the notion of regularity or consistency. TAG is applied to 5 design versions of an interface design case. For each of the versions TAG description is given and checked on consistency. It is concluded that application of TAG could make interfaces easier to learn.
TL;DR: Linguistic theory presented in the book "Generalized Phrase Structure Grammar" is not a psychological theory of language use or language learning.
Abstract: Abstract Gazdar, Klein, Pullum, and Sag (1985) made it very clear in the introductory chapter of their book on Generalized Phrase Structure Grammar that what they were about to present was a linguistic theory, not a psychological theory of language use or language learning. They were prepared to acknowledge that some relationship between the two might be forged: “since a given linguistic theory will make specific claims about the nature of languages, it may well in turn suggest specific kinds of psycholinguistic hypotheses.” But their estimate of actual progress to date in identifying and testing such hypotheses was quite glum.