TL;DR: It has been proposed that social learning phenomena be subsumed within the categorization scheme currently used by investigators of asocial learning, and three alignments have been proposed, intended to be a set of hypotheses, rather than conclusions, about the mechanisms of social learning.
Abstract: There has been relatively little research on the psychological mechanisms of social learning. This may be due, in part, to the practice of distinguishing categories of social learning in relation to ill-defined mechanisms (Davis, 1973; Galef, 1988). This practice both makes it difficult to identify empirically examples of different types of social learning, and gives the false impression that the mechanisms responsible for social learning are clearly understood. It has been proposed that social learning phenomena be subsumed within the categorization scheme currently used by investigators of asocial learning. This scheme distinguishes categories of learning according to observable conditions, namely, the type of experience that gives rise to a change in an animal (single stimulus vs. stimulus-stimulus relationship vs. response-reinforcer relationship), and the type of behaviour in which this change is detected (response evocation vs. learnability) (Rescorla, 1988). Specifically, three alignments have been proposed: (i) stimulus enhancement with single stimulus learning, (ii) observational conditioning with stimulus-stimulus learning, or Pavlovian conditioning, and (iii) observational learning with response-reinforcer learning, or instrumental conditioning. If, as the proposed alignments suggest, the conditions of social and asocial learning are the same, there is some reason to believe that the mechanisms underlying the two sets of phenomena are also the same. This is so if one makes the relatively uncontroversial assumption that phenomena which occur under similar conditions tend to be controlled by similar mechanisms. However, the proposed alignments are intended to be a set of hypotheses, rather than conclusions, about the mechanisms of social learning; as a basis for further research in which animal learning theory is applied to social learning. A concerted attempt to apply animal learning theory to social learning, to find out whether the same mechanisms are responsible for social and asocial learning, could lead both to refinements of the general theory, and to a better understanding of the mechanisms of social learning. There are precedents for these positive developments in research applying animal learning theory to food aversion learning (e.g. Domjan, 1983; Rozin & Schull, 1988) and imprinting (e.g. Bolhuis, de Vox & Kruit, 1990; Hollis, ten Cate & Bateson, 1991). Like social learning, these phenomena almost certainly play distinctive roles in the antogeny of adaptive behaviour, and they are customarily regarded as 'special kinds' of learning (Shettleworth, 1993).(ABSTRACT TRUNCATED AT 400 WORDS)
TL;DR: The authors argue that sociocultural theory can provide an explanatory framework for understanding and refining the notions of how learners become competent members of a language learning community and also show how participation in this community is characterized by the learner's ability to develop, reflect upon, and refine their own language learning strategies.
Abstract: Sociocultural theory maintains that social interaction and cultural institutions, such as schools, classrooms, etc., have important roles to play in an individual's cognitive growth and development. The purpose of this paper is specifically to address the issue of the development of language learning strategies within sociocultural theory. Through a case study of an intact college French class whose learning was mediated by the use of a portfolio assessment procedure, we will attempt to show how reconfiguring the culture of the language classroom can contribute to the growth and development of strategic learning. We will argue that sociocultural theory can provide an explanatory framework for understanding and refining our notions of how learners become competent members of a language learning community (36). We will also show how participation in this community is characterized by the learner's ability to develop, reflect upon, and refine their own language learning strategies (16; 17). Evidence will be provided that support the notion that the development of language learning strategies is mainly a by-product of mediation and socialization into a community of language learning practice. This approach differs from much of the research on learning strategies that emphasizes the identification of strategy types (25), variables affecting the choice of strategies (31), or investigations of their teach ability and learnability (4; 29). Moreover, this perspective also questions the notion that "strategies are perhaps the product of one's cognitive style, personality, or hemispheric preference" (15: p. 199). Rather, sociocultural theory maintains that emergence of strategies is a process directly connected to the practices of cultural groups through which novices develop into competent members of these communities.
TL;DR: A new model of learning probability distributions from independent draws is introduced, inspired by the popular Probably Approximately Correct (PAC) model for learning boolean functions from labeled examples, in the sense that it emphasizes efficient and approximate learning, and it studies the learnability of restricted classes of target distributions.
Abstract: We introduce and investigate a new model of learning probability distributions from independent draws. Our model is inspired by the popular Probably Approximately Correct (PAC) model for learning boolean functions from labeled examples [24], in the sense that we emphasize efficient and approximate learning, and we study the learnability of restricted classes of target distributions. The dist ribut ion classes we examine are often defined by some simple computational mechanism for transforming a truly random string of input bits (which is not visible to the learning algorithm) into the stochastic observation (output) seen by the learning algorithm. In this paper, we concentrate on discrete distributions over {O, I}n. The problem of inferring an approximation to an unknown probability distribution on the basis of independent draws has a long and complex history in the pattern recognition and statistics literature. For instance, the problem of estimating the parameters of a Gaussian density in highdimensional space is one of the most studied statistical problems. Distribution learning problems have often been investigated in the context of unsupervised learning, in which a linear mixture of two or more distributions is generating the observations, and the final goal is not to model the distributions themselves, but to predict from which distribution each observation was drawn. Data clustering methods are a common tool here. There is also a large literature on nonpararnetric density estimation, in which no assumptions are made on the unknown target density. Nearest-neighbor approaches to the unsupervised learning problem often arise in the nonparametric setting. While we obviously cannot do justice to these areas here, the books of Duda and Hart [9] and Vapnik [25] provide excellent overviews and introductions to the pattern recognition work, as well as many pointers for further reading. See also Izenman’s recent survey article [16]. Roughly speaking, our work departs from the traditional statistical and pattern recognition approaches in two ways. First, we place explicit emphasis on the comput ationrd complexity of distribution learning. It seems fair to say that while previous research has provided an excellent understanding of the information-theoretic issues involved in dis-
TL;DR: It is proposed that good performance will be manifest when both ecological validity and learnability are high, but thatperformance will be poor when one of these is low.
Abstract: Frequently the same biases have been manifest in experts as by students in the laboratory, but expertise studies are often no more ecologically valid than laboratory studies because the methods used in both are similar. Further, real-world tasks vary in their learnability, or the availability of outcome feedback necessary for a judge to improve performance with experience. We propose that good performance will be manifest when both ecological validity and learnability are high, but that performance will be poor when one of these is low. Finally, we suggest how researchers and practitioners might use these task-analytic constructs in order to identify true expertise for the formulation of decision support.
TL;DR: In this paper, a general framework for identifying meaningful structures in empirical data, namely, structures permitting effective organization of the data to meet requirements of future queries, has been proposed, whereby the notion of identifiability is given a precise formal definition similar to that of learnability.
Abstract: This paper presents several investigations into the prospects for identifying meaningful structures in empirical data, namely, structures permitting effective organization of the data to meet requirements of future queries. We propose a general framework whereby the notion of identifiability is given a precise formal definition similar to that of learnability. Using this framework, we then explore if a tractable procedure exists for deciding whether a given relation is decomposable into a constraint network or a CNF theory with desirable topology and, if the answer is positive, identifying the desired decomposition. Finally, we address the problem of expressing a given relation as a Horn theory and, if this is impossible, finding the best k-Horn approximation to the given relation. We show that both problems can be solved in time polynomial in the length of the data.
TL;DR: The learnable sublanguage of the restricted first-order logics known as “description logics” appears to be incomparable in expressive power to any subset of first- order logic previously known to be learnable.
Abstract: Although there is an increasing amount of experimental research on learning concepts expressed in first-order logic, there are still relatively few formal results on the polynomial learnability of first-order representations from examples. Most previous analyses in the pac-model have focused on subsets of Prolog, and only a few highly restricted subsets have been shown to be learnable. In this paper, we will study instead the learnability of the restricted first-order logics known as “description logics”, also sometimes called “terminological logics” or “KL-ONE-type languages”. Description logics are also subsets of predicate calculus, but are expressed using a different syntax, allowing a different set of syntactic restrictions to be explored. We first define a simple description logic, summarize some results on its expressive power, and then analyze its learnability. It is shown that the full logic cannot be tractably learned. However, syntactic restrictions exist that enable tractable learning from positive examples alone, independent of the size of the vocabulary used to describe examples. The learnable sublanguage appears to be incomparable in expressive power to any subset of first-order logic previously known to be learnable.
TL;DR: This work investigates the learnability of a typical description logic, CLASSIC, and shows that CLASSIC sentences are learnable in polynomial time in the exact learning model using equivalence queries and membership queries (which are in essence, “subsumption queries”).
Abstract: Description logics, also called terminological logics, are commonly used in knowledge-based systems to describe objects and their relationships. We investigate the learnability of a typical description logic, CLASSIC, and show that CLASSIC sentences are learnable in polynomial time in the exact learning model using equivalence queries and membership queries (which are in essence, “subsumption queries”). We show that membership queries alone are insufficient for polynomial time learning of CLASSIC sentences. Combined with earlier negative results of Cohen and Hirsh showing that, given standard complexity theoretic assumptions, equivalence queries alone are insufficient (or random examples alone in the PAC setting are insufficient), this shows that both sources of information are necessary for efficient learning in that neither type alone is sufficient. In addition, we show that a modification of the algorithm deals robustly with persistent malicious two-sided classification noise in the membership queries with the probability of a misclassification bounded below 1/2.
TL;DR: A new theory of generalization in neural network types of learning machines is presented, and estimates of the generalization error are obtained in both post-training and during the training process for general linear and nonlinear machines.
Abstract: This thesis presents a new theory of generalization in neural network types of learning machines. The new theory can be viewed as a refinement of the decision theoretical framework of learning based on the uniform weak law in probability theory (i.e., the VC-method), and leads to a finer degree of approximation hitherto available. The role played by the VC-theory in studying learning problems becomes evident in the new framework. Indeed, the intrinsic limitation of the VC-theory in assessment of generalization error is demonstrated. The focus is on assessment and improvement of generalization performance when there is only a finite number of examples. In a unified framework, the theory provides systematic answers to the problems of learnability, assessment of generalization error, temporal dynamics of generalization, and design of machine complexity. Under conditions weaker than those required for distribution-free (or Probably Approximately Correct learning), it proves a kind of learnability for both fixed and varying machine structures, and gives rates of growth of machine size for attaining learnability in the latter case. The theory introduces a new method for assessing the generalization performance, and obtains estimates of the generalization error in both post-training and during the training process for general linear and nonlinear machines. These results contribute to the problem of how generalization error is related to the number of examples and machine complexity, and provide answers to the open problems of when learning should be stopped and how the complexity of the machine affects the generalization error during the training process; thus providing a precise language for describing the over-training phenomenon. The results on generalization error estimation lead to criteria for choosing correct size of machines and optimal stopping time simultaneously so that near optimal generalization performance is attained. These criteria find connections with the Akaike's Information Criterion and Minimal Description Length Principle for machine size selection, and shed light on the properties of the latter. The effects of regularization on the generalization error as well as the relation between regularization and early stopping are analyzed. These results in turn provide guidelines for choosing the regularization function. The results of this thesis are relevant to problems of regression, pattern recognition, statistical function estimation, and stochastic approximation.
TL;DR: This article investigated the role of grammatical consciousness-raising (C-R), a cognitive approach to grammatical instruction developed by Sharwood Smith (1981) and Rutherford (1987), and found that it helps learners at certain levels with certain aspects of grammar.
Abstract: Introduction Perhaps one of the questions most often raised by language teaching professionals is whether students should be taught grammar and if it really helps. It would be most welcome if there were a definite yes or no answer. If it were demonstrated that grammatical instruction does not help in any circumstances, we need not bother about it and could then turn to other ways of teaching methodology. If it could be shown conclusively to help, then we would need to know in what circumstances and how to go about it. Unfortunately, the question is not nearly so simple. The best answer we can currently offer is that it helps for certain learners at certain levels with certain aspects of grammar. If this is the case, the question is worth pursuing, and the research agenda is to spell out each of these conditions. This paper focuses on the question of which aspects of grammar call for instruction and why. We investigate the role of grammatical consciousness-raising (C-R), a cognitive approach to grammatical instruction developed by Sharwood Smith (1981) and Rutherford (1987). The results of a classroom study on second language learners’ acquisition of “ergative” verbs in English suggest that this approach is viable. We shall argue that in light of both empirical results and learnability considerations, certain areas of grammar call for some form of grammatical instruction, to which C-R can be an effective approach. For and against grammatical instruction Grammar instruction in general has been in and out of language methodologies following the pendulum swing from grammar-driven audiolingual methods to communicative approaches which consider grammar as something peripheral.
TL;DR: It is argued that U-learnability is more appropriate than PAC for Universal (Turing computable) languages and allows a unified characterisation of speed-up learning and inductive learning.
Abstract: Inductive Logic Programming (ILP) involves the construction of first-order definite clause theories from examples and background knowledge Unlike both traditional Machine Learning and Computational Learning Theory, ILP is based on lock-step development of Theory, Implementations and Applications ILP systems have successful applications in the learning of structure-activity rules for drug design, semantic grammars rules, finite element mesh design rules and rules for prediction of protein structure and mutagenic molecules The strong applications in ILP can be contrasted with relatively weak PAC-learning results (even highly-restricted forms of logic programs are known to be prediction-hard) It has been recently argued that the mismatch is due to distributional assumptions made in application domains These assumptions can be modelled as a Bayesian prior probability representing subjective degrees of belief Other authors have argued for the use of Bayesian prior distributions for reasons different to those here, though this has not lead to a new model of polynomial-time learnability Incorporation of Bayesian prior distributions over time-bounded hypotheses in PAC leads to a new model called U-learnability It is argued that U-learnability is more appropriate than PAC for Universal (Turing computable) languages Time-bounded logic programs have been shown to be polynomially U-learnable under certain distributions The use of time-bounded hypotheses enforces decidability and allows a unified characterisation of speed-up learning and inductive learning U-learnability has as special cases PAC and Natarajan's model of speed-up learning
TL;DR: The second volume in the Vancouver Studies in Cognitive Science series as discussed by the authors presents recent work in the fields of phonology, morphology, semantics, and neurolinguistics, focusing on the relationship between the contents of grammatical formalisms and their real-time realizations in machine or biological systems.
Abstract: The second volume in the Vancouver Studies in Cognitive Science series, this collection presents recent work in the fields of phonology, morphology, semantics, and neurolinguistics. Its overall theme is the relationship between the contents of grammatical formalisms and their real-time realizations in machine or biological systems. Individual essays address such topics as learnability, implementability, computational issues, parameter setting, and neurolinguistic issues. Contributors include Janet Dean Fodor, Richard T. Oehrle, Bob Carpenter, Edward P. Stabler, Elan Dresher, Arnold Zwicky, Mary-Louis Kean, and Lewis P. Shapiro.
TL;DR: The Linguistische Arbeiten [Linguistic studies] series as discussed by the authors has made a significant contribution to the development of linguistic theory both in Germany and internationally, and has been widely used in the literature.
Abstract: Over the past few decades, the book series Linguistische Arbeiten [Linguistic Studies], comprising over 500 volumes, has made a significant contribution to the development of linguistic theory both in Germany and internationally. The series will continue to deliver new impulses for research and maintain the central insight of linguistics that progress can only be made in acquiring new knowledge about human languages both synchronically and diachronically by closely combining empirical and theoretical analyses. To this end, we invite submission of high-quality linguistic studies from all the central areas of general linguistics and the linguistics of individual languages which address topical questions, discuss new data and advance the development of linguistic theory.
TL;DR: The learnability of Read-k-Satisfy-j (RkSj) DNF formulae is studied, and it is shown that this class of functions is learnable in polynomial time, using Equivalence and Membership Queries.
Abstract: We study the learnability of Read-k-Satisfy-j (RkSj) DNF formulae. These are DNF formulae in which the maximal number of occurrences of a variable is bounded by k, and the number of terms satisfied by any assignment is at most j. We show that this class of functions is learnable in polynomial time, using Equivalence and Membership Queries, as long as k•j=O(logn/loglogn). Learnability was previously known only in case that both k and j are constants. We also present a family of boolean functions that have short (poly(n)) Read-2-Satisfy-1 DNF formulae but require CNF formulae of size > 2W(n). Therefore, our result does not seem to follow from the recent learnability result of [Bsh93].
TL;DR: The purpose of this paper is to prepare a formal framework for studying “polynomial-time” query learnability, and introduces necessary notation and clarify notions that are necessary for discussing polynomial-time query learning.
Abstract: Query learning is to learn aconcept (i.e., a representation of some language) through communication with ateacher (i.e., someone who knows the concept). The purpose of this paper is to prepare a formal framework for studying “polynomial-time” query learnability. We introduce necessary notation and, by using several examples, clarify notions that are necessary for discussing polynomial-time query learning.
TL;DR: This book discusses Syntactic Theory and First Language Acquisition, Syntactic Bootstrapping and the Acquisition of Noun Meanings: The Mass-Count Issue, and the Separation of Universal Principles and Language-Specific Principles.
Abstract: Volume 1: Heads, Projections, and Learnability Contents: B. Lust, I. Barbier, C. Foley, G. Hermon, S. Kapur, J. Kornfilt, Z. Nunez del Prado, M. Suner, J. Whitman, General Introduction: Syntactic Theory and First Language Acquisition: Cross-Linguistic Perspectives. J. Whitman, I. Barbier, K. Boser, S. Kapur, J. Kornfilt, B. Lust, Volume I Introduction: Constraining Structural Variation and the Acquisition Problem. Part I:Syntactic Foundations: Phrase Structure Principles and Parameters. C.-T.J. Huang, More on Chinese Word Order and Parametric Theory. L. Haegeman, Negative Heads and Negative Operators: The NEG Criterion. K. Hale, S.J. Keyser, Constraints on Argument Structure. Part II:Functional Categories and Phrase Structure in the Initial State. Section A:Heads and Projections in Morphosyntax. J. Grimshaw, Minimal Projection and Clause Structure. B. Lust, Functional Projection of CP and Phrase Structure Parameterization: An Argument for the Strong Continuity Hypothesis. K. Demuth, On the Underspecification of Functional Categories in Early Grammars. A. Radford, Tense and Agreement Variability in Child Grammars of English. Y. Otsu, Case-Marking Particles and Phrase Structure in Early Japanese Acquisition. J. Kornfilt, Some Remarks on the Interaction of Case and Word Order in Turkish: Implications for Acquisition. C. McKee, What You See Isn't Always What You Get. Section B:The V-2 Debate. J. Weissenborn, Constraining the Child's Grammar: Local Well-Formedness in the Development of Verb Movement in German and French. V. Deprez, Underspecification, Functional Projections, and Parameter Setting. J. Whitman, In Defense of the Strong Continuity Account of the Acquisition of Verb-Second. Part III:Learnability. L. Gleitman, H. Gleitman, A Picture Is Worth a Thousand Words, But That's the Problem: The Role of Syntax in Vocabulary Acquisition. G. Chierchia, Syntactic Bootstrapping and the Acquisition of Noun Meanings: The Mass-Count Issue. S. Flynn, G. Martohardjono, Mapping from the Initial State to the Final State: The Separation of Universal Principles and Language-Specific Principles. Volume 2: Binding, Dependencies, and Learnability Contents: B. Lust, I. Barbier, C. Foley, G. Hermon, S. Kapur, J. Kornfilt, Z. Nunez del Prado, M. Suner, J. Whitman, General Introduction: Syntactic Theory and First Language Acquisition: Cross-Linguistic Perspectives. B. Lust, J. Kornfilt, G. Hermon, C. Foley, Z. Nunez del Prado, S. Kapur, Introduction to Volume 2: Constraining Binding, Dependencies and Learnability: Principles or Parameters? Part I:Syntactic Foundations: Anaphora and Binding. J. Koster, Toward a New Theory of Anaphoric Binding. C-C.J. Tang, A Note on Relativized SUBJECT for Reflexives in Chinese. Y. Li, The Japanese Dialectal Zisin and Its Theoretical Implications: A Contrast with Chinese Ziji Yafei Li. G. Hermon, Long-Distance Reflexives in UG: Theoretical Approaches and Predictions for Acquisition. Part II:Lexical Anaphors and Pronouns. C. Jakubowicz, Reflexives in French and Danish: Morphology, Syntax, and Acquisition. R. Mazuka, B. Lust, When Is an Anaphor Not an Anaphor? D. Kaufman, Grammatical or Pragmatic: Will the Real Principle B Please Stand? C. Koster, Problems with Pronoun Acquisition. E. Reuland, Commentary: The Nonhomogeneity of Condition B and Related Issues. Part III:'Pro Drop'. L. Rizzi, Early Null Subjects and Root Null Subjects. V. Valian, Children's Postulation of Null Subjects: Parameter Setting and Language Acquisition. N. Hyams, Commentary: Null Subjects in Child Language and the Implications of Cross-Linguistic Variation. D. Lillo-Martin, Setting the Null Argument Parameters: Evidence from American Sign Language and Other Languages. A. Pierce, On the Differing Status of Subject Pronouns in French and English Child Language. C.S. Smith, Pragmatic Principles in Coreference. Part IV:WH- and Quantifier Scope. T. Roeper, J. De Villiers, Lexical Links in the Wh-Chain. Y-C. Chien, Structural Determinants of Quantifier Scope: An Experimental Study of Chinese First Language Acquisition. J. Whitman, Scope and Optionality: Comments on the Papers on Wh-Movement and Quantification. Part V:Learnability. J.D. Fodor, How to Obey the Subset Principle: Binding and Locality. D. Lightfoot, Degree-O Learnability. R. Clark, Finitude, Boundedness, and Complexity: Learnability and the Study of First Language Acquisition. S. Kapur, Some Applications of Formal Learning Theory Results to Natural Language Acquisition. A. Joshi, Commentary: Some Remarks on the Subset Principle.
TL;DR: It is shown in this paper that when more than one coach is used in a game-playing environment, the collective learning result is better than other learning curves in which only a single coach is involved, no matter whether the coach is the best one or the worst one.
Abstract: Explores the concept of diversified selection by employing multiple coaches in a game-playing program with a genetic algorithm (GA) based learning module. Although the importance of diversity in choosing offspring in a gene pool has been addressed in the past, few authors have discussed how to maintain diversity in real-world applications. Most existing suggestions are based on a balanced distribution of candidates, but this is not a realistic assumption for search problems in a multidimensional space. We show in this paper that when more than one coach is used in a game-playing environment, the collective learning result is better than other learning curves in which only a single coach is involved, no matter whether the coach is the best one or the worst one. We also use expanded chromosomes for measuring position scores in a static evaluation function to achieve improved learnability. Our work can be classified under the evolutionary strategy paradigm mentioned by K. De Jong and W. Spears (1993). >
TL;DR: In this paper, the authors discuss the role of syntactic structures in the early stages of first language acquisition and propose a new theory of anaphora and binding, which is based on the strong continuuity hypothesis.
Abstract: Volume 1: Heads, Projections, and Learnability Contents: B. Lust, I. Barbier, C. Foley, G. Hermon, S. Kapur, J. Kornfilt, Z. Nunez del Prado, M. Suner, J. Whitman, General Introduction: Syntactic Theory and First Language Acquisition: Cross-Linguistic Perspectives. J. Whitman, I. Barbier, K. Boser, S. Kapur, J. Kornfilt, B. Lust, Volume I Introduction: Constraining Structural Variation and the Acquisition Problem. Part I:Syntactic Foundations: Phrase Structure Principles and Parameters. C.-T.J. Huang, More on Chinese Word Order and Parametric Theory. L. Haegeman, Negative Heads and Negative Operators: The NEG Criterion. K. Hale, S.J. Keyser, Constraints on Argument Structure. Part II:Functional Categories and Phrase Structure in the Initial State. Section A:Heads and Projections in Morphosyntax. J. Grimshaw, Minimal Projection and Clause Structure. B. Lust, Functional Projection of CP and Phrase Structure Parameterization: An Argument for the Strong Continuity Hypothesis. K. Demuth, On the Underspecification of Functional Categories in Early Grammars. A. Radford, Tense and Agreement Variability in Child Grammars of English. Y. Otsu, Case-Marking Particles and Phrase Structure in Early Japanese Acquisition. J. Kornfilt, Some Remarks on the Interaction of Case and Word Order in Turkish: Implications for Acquisition. C. McKee, What You See Isn't Always What You Get. Section B:The V-2 Debate. J. Weissenborn, Constraining the Child's Grammar: Local Well-Formedness in the Development of Verb Movement in German and French. V. Deprez, Underspecification, Functional Projections, and Parameter Setting. J. Whitman, In Defense of the Strong Continuity Account of the Acquisition of Verb-Second. Part III:Learnability. L. Gleitman, H. Gleitman, A Picture Is Worth a Thousand Words, But That's the Problem: The Role of Syntax in Vocabulary Acquisition. G. Chierchia, Syntactic Bootstrapping and the Acquisition of Noun Meanings: The Mass-Count Issue. S. Flynn, G. Martohardjono, Mapping from the Initial State to the Final State: The Separation of Universal Principles and Language-Specific Principles. Volume 2: Binding, Dependencies, and Learnability Contents: B. Lust, I. Barbier, C. Foley, G. Hermon, S. Kapur, J. Kornfilt, Z. Nunez del Prado, M. Suner, J. Whitman, General Introduction: Syntactic Theory and First Language Acquisition: Cross-Linguistic Perspectives. B. Lust, J. Kornfilt, G. Hermon, C. Foley, Z. Nunez del Prado, S. Kapur, Introduction to Volume 2: Constraining Binding, Dependencies and Learnability: Principles or Parameters? Part I:Syntactic Foundations: Anaphora and Binding. J. Koster, Toward a New Theory of Anaphoric Binding. C-C.J. Tang, A Note on Relativized SUBJECT for Reflexives in Chinese. Y. Li, The Japanese Dialectal Zisin and Its Theoretical Implications: A Contrast with Chinese Ziji Yafei Li. G. Hermon, Long-Distance Reflexives in UG: Theoretical Approaches and Predictions for Acquisition. Part II:Lexical Anaphors and Pronouns. C. Jakubowicz, Reflexives in French and Danish: Morphology, Syntax, and Acquisition. R. Mazuka, B. Lust, When Is an Anaphor Not an Anaphor? D. Kaufman, Grammatical or Pragmatic: Will the Real Principle B Please Stand? C. Koster, Problems with Pronoun Acquisition. E. Reuland, Commentary: The Nonhomogeneity of Condition B and Related Issues. Part III:'Pro Drop'. L. Rizzi, Early Null Subjects and Root Null Subjects. V. Valian, Children's Postulation of Null Subjects: Parameter Setting and Language Acquisition. N. Hyams, Commentary: Null Subjects in Child Language and the Implications of Cross-Linguistic Variation. D. Lillo-Martin, Setting the Null Argument Parameters: Evidence from American Sign Language and Other Languages. A. Pierce, On the Differing Status of Subject Pronouns in French and English Child Language. C.S. Smith, Pragmatic Principles in Coreference. Part IV:WH- and Quantifier Scope. T. Roeper, J. De Villiers, Lexical Links in the Wh-Chain. Y-C. Chien, Structural Determinants of Quantifier Scope: An Experimental Study of Chinese First Language Acquisition. J. Whitman, Scope and Optionality: Comments on the Papers on Wh-Movement and Quantification. Part V:Learnability. J.D. Fodor, How to Obey the Subset Principle: Binding and Locality. D. Lightfoot, Degree-O Learnability. R. Clark, Finitude, Boundedness, and Complexity: Learnability and the Study of First Language Acquisition. S. Kapur, Some Applications of Formal Learning Theory Results to Natural Language Acquisition. A. Joshi, Commentary: Some Remarks on the Subset Principle.
TL;DR: This paper studies the computational power of polynomialtime query learning systems for several query types, and the computational complexity of a “learning problem” for several representation classes, and proves some polynometric-time nonlearnability results.
Abstract: In this paper we study the computational power of polynomialtime query learning systems for several query types, and the computational complexity of a “learning problem” for several representation classes. As corollaries of our results, we prove some polynomial-time nonlearnability results, and relate polynomial-time learnability of some representation classes to the complexity of representation finding problems of P/poly oracles. For example, forCIR, a representation class by logical circuits, it is shown that P(NP 1 () ) is an upper bound of power of query learning systems forCIR, and that P(NP 1 () ) is also a lower bounds of power of query learning systems forCIR when they are used to learn a certain subclassR ofCIR. It is also shown that the problem of learningCIR is P(NP(NP 1 () ))-solvable. Then, using these results, the following relations are proved: (1) If, for someA e P/poly, the representation finding problem ofA is not in P(NP 1 A ), thenCIR is not polynomial-time query learnable even by using queries such as membership, equivalence, subset, superset, etc. (2) On the other hand, if the above-mentioned subclassR ofCIR is not polynomial-time query learnable by using subset and superset queries, then some Be P/poly exists such that its representation finding problem is not in P(NP 1 B ).
TL;DR: This work addresses the issue of identifying conditions that, once a learning algorithm meets them, it can be transformed into a noise-tolerant algorithm.
Abstract: We consider the question of learning in the presence of classification noise. More specifically, we address the issue of identifying conditions that, once a learning algorithm meets them, it can be transformed into a noise-tolerant algorithm.
TL;DR: Based on the uniform distribution PAC learning model, the learnability for monotone disjunctive normal form formulas with at most O(log n) terms is investigated and an algorithm that learns O(n) -term MDNF in polynomial time is given.
Abstract: Based on the uniform distribution PAC learning model, the learnability for monotone disjunctive normal form formulas with at most O(logn) terms (O(logn)-term MDNF) is investigated. Using the technique of restriction, an algorithm that learns O(logn)-term MDNF in polynomial time is given.
TL;DR: It is shown that uniform boundedness of the metric entropy of the class of decision rules is both necessary and sufficient for learnability under each of two conditions: (i) the family of probability measures is totally bounded, with respect to the total variation metric, and (ii) theFamily of probabilities contains an interior point, when equipped with the same metric.
Abstract: In this paper, uniformly consistent estimation (learnability) of decision rules for pattern classification under a family of probability measures is investigated. In particular, it is shown that uniform boundedness of the metric entropy of the class of decision rules is both necessary and sufficient for learnability under each of two conditions: (i) the family of probability measures is totally bounded, with respect to the total variation metric, and (ii) the family of probability measures contains an interior point, when equipped with the same metric. In particular, this shows that insofar as uniform consistency is concerned, when the family of distributions contains a total variation neighborhood, nothing is gained by this knowledge about the distribution. Then two sufficient conditions for learnability are presented. Specifically, it is shown that learnability with respect to each of a finite collection of families of probability measures implies learnability with respect to their union; also, learnability with respect to each of a finite number of measures implies learnability with respect to the convex hull of the corresponding families of uniformly absolutely continuous probability measures.
TL;DR: The present paper deals with the learnability of indexed families of uniformly recursive languages from positive data under various postulates of naturalness by considering set-driven and rearrangement-independent learners, i.e., learning devices whose output exclusively depends on therange and on the range and length of their input, respectively.
Abstract: The present paper deals with the learnability of indexed families of uniformly recursive languages from positive data under various postulates of naturalness. In particular, we consider set-driven and rearrangement-independent learners, i.e., learning devices whose output exclusively depends on the range and on the range and length of their input, respectively. The impact of set-drivenness and rearrangement-independence on the behavior of learners to their learning power is studied in dependence on the hypothesis space the learners may use. Furthermore, we consider the influence of set-drivenness and rearrangementindependence for learning devices that realize the subset principle to different extents. Thereby we distinguish between strong-monotonic, monotonic and weak-monotonic or conservative learning.
TL;DR: In this article, the authors show the hardness of PAC-learning for networks with a particular class of activation and show that the loading problem is NP-complete for any class of activations.
Abstract: This paper deals with learnability of concept classes defined by neural networks, showing the hardness of PAC-learning (in the complexity, not merely information-theoretic sense) for networks with a particular class of activation. The obstruction lies not with the VC dimension, which is known to grow slowly; instead, the result follows the fact that the loading problem is NP-complete. (The complexity scales badly with input dimension; the loading problem is polynomial-time if the input dimension is constant.) Similar and well-known theorems had already been proved by Megiddo and by Blum and Rivest, for binary-threshold networks. It turns out the general problem for continuous sigmoidal-type functions, as used in practical applications involving steepest descent, is not NP-hard—there are “sigmoidals” for which the problem is in fact trivial—so it is an open question to determine what properties of the activation function cause difficulties. Ours is the first result on the hardness of loading networks which do not consist of binary neurons; we employ a piecewise-linear activation function that has been used in the neural network literature. Our theoretical results lend further justification to the use of incremental (architecture-changing) techniques for training networks.
TL;DR: The authors show that arbitrary linguistic features and arbitrary complex tree structures can indeed also be learned by a connectionist parsing system.
Abstract: Due to robustness, learnability and ease of integration of different information sources, connectionist parsing systems have proven to be applicable for parsing spoken language, However, most proposed connectionist parsers do not compute and represent complex structures. These parsers assign only a very limited structure to a given input string. For spoken language translation and data base access, more detailed syntactic and semantic representation is needed. In the present paper, the authors show that arbitrary linguistic features and arbitrary complex tree structures can indeed also be learned by a connectionist parsing system. >
TL;DR: The measure of efficiency is applied to prove the superiority of class comprising learning algorithms over class preserving learning which itself turns out to be superior to exact learning algorithms.
Abstract: In the present paper we study the learnability of the enumerable families L of uniformly recursive languages in dependence on the number of allowed mind changes, i.e., with respect to a well-studied measure of efficiency. We distinguish between exact learnability ( L has to be learnt w.r.t. the hypothesis space L itself), class preserving learning ( L has to be inferred w.r.t. some hypothesis space G having the same range as L ), and class comprising inference ( L has to be inferred w.r.t. some hypothesis space G that has a range including range ( L )) as well as between learning from positive and negative examples. The measure of efficiency is applied to prove the superiority of class comprising learning algorithms over class preserving learning which itself turns out to be superior to exact learning algorithms. In particular, we considerably improve results obtained previously and show that a suitable choice of the hypothesis space may result in a considerable speed up of learning algorithms, even if instead of positive and negative data only positive examples will be presented. Furthermore, we completely separate all modes of learning with a bounded number of mind changes from class preserving learning that avoids overgeneralization.
TL;DR: This paper is the first result on the hardness of loading networks which do not consist of binary neurons, and employs a piecewise-linear activation function that has been used in the neural network literature, lending further justification to the use of incremental techniques for training networks.
Abstract: This paper deals with learnability of concept classes defined by neural networks, showing the hardness of PAC-learning (in the complexity, not merely information-theoretic sense) for networks with a particular class of activation. The obstruction lies not with the VC dimension, which is known to grow slowly; instead, the result follows the fact that the loading problem is NP-complete. (The complexity scales badly with input dimension; the loading problem is polynomial-time if the input dimension is constant.) Similar and well-known theorems had already been proved by Megiddo and by Blum and Rivest, for binary-threshold networks. It turns out the general problem for continuous sigmoidal-type functions, as used in practical applications involving steepest descent, is not NP-hard—there are “sigmoidals” for which the problem is in fact trivial—so it is an open question to determine what properties of the activation function cause difficulties. Ours is the first result on the hardness of loading networks which do not consist of binary neurons; we employ a piecewise-linear activation function that has been used in the neural network literature. Our theoretical results lend further justification to the use of incremental (architecture-changing) techniques for training networks.
TL;DR: It turns out that these bounds grow linearly with the task complexity, measured via the VC-dimension of the class of objects one deals with, which is a well studied parameter measuring the combinatorial complexity of families of sets.
Abstract: We analyze the amount of information needed to carry out various model-based recognition tasks, in the context of a probabilistic data collection model. We focus on objects that may be described as semi-algebraic subsets of a Euclidean space, and on a wide class of object transformations, including perspective and affine transformations of 2D objects, and perspective projections of 3D objects. Our approach borrows from computational learning theory. We draw close relations between recognition tasks and a certain learnability framework. We then apply basic techniques of learnability theory to derive upper bounds on the number of data features that (provably) suffice for drawing reliable conclusions. The bounds are based on a quantitative analysis of the complexity of the hypotheses class that one has to choose from. Our central tool is the VC-dimension, which is a well studied parameter measuring the combinatorial complexity of families of sets. It turns out that these bounds grow linearly with the task complexity, measured via the VC-dimension of the class of objects one deals with.
TL;DR: Within the present paper, case-based representability as well as case- based learnability of indexed families of uniformly recursive languages are investigated, both with respect to an arbitrary fixed similarity measure.
Abstract: Within the present paper we investigate case-based representability as well as case-based learnability of indexed families of uniformly recursive languages. Since we are mainly interested in case-based learning with respect to an arbitrary fixed similarity measure, case-based learnability of an indexed family requires its represent ability, first.