TL;DR: The Gradual Learning Algorithm (GLA) as mentioned in this paper is a constraint-ranking algorithm for learning optimality-theoretic grammars, which can learn free variation, deal effectively with noisy learning data, and account for gradient well-formedness judgments.
Abstract: The Gradual Learning Algorithm (Boersma 1997) is a constraint-ranking algorithm for learning optimality-theoretic grammars. The purpose of this article is to assess the capabilities of the Gradual Learning Algorithm, particularly in comparison with the Constraint Demotion algorithm of Tesar and Smolensky (1993, 1996, 1998, 2000), which initiated the learnability research program for Optimality Theory. We argue that the Gradual Learning Algorithm has a number of special advantages: it can learn free variation, deal effectively with noisy learning data, and account for gradient well-formedness judgments. The case studies we examine involve Ilokano reduplication and metathesis, Finnish genitive plurals, and the distribution of English light and dark /l/.
TL;DR: 1. Learnability and the acquisition of syntax M. Atkinson, and the structural triggers learner W. Fodor.
Abstract: 1. A brief overview of learnability S. Bertolo 2. Learnability and the acquisition of syntax M. Atkinson 3. Language change and learnability I. Roberts 4. Information theory, complexity and linguistic descriptions R. Clark 5. The structural triggers learner W. G. Sakas and J. D. Fodor.
TL;DR: The fundamental goal of a theory of second language acquisition (SLA) is to explain the acquisition of competence in a second language, but this may seem a straightforward enough formulation, but it includes some not-so-innocent presuppositions that should be made explicit.
Abstract: Introduction: SLA theory The fundamental goal of a theory of second language acquisition (SLA) is to explain the acquisition of competence in a second language. This may seem a straightforward enough formulation, but it includes some not-so-innocent presuppositions that should be made explicit. Especially we need to be careful with the terms ‘explain’, ‘acquisition’, and ‘competence’. I am assuming that a theory, any theory, has as its aim the explanation of some phenomena within its domain; not mere description, and not just prediction (Gregg, 1993). Of course, a sufficiently precise description of the phenomena is a requisite for a satisfying explanation, and successful predictions are useful evidence that the explanation is in fact correct. But the goal is explanation. I belabour this point because in fact it is anything but easy to agree on whether an explanation has been offered, let alone a successful one. I return to this question in the next section. In our case the phenomenon in question is linguistic competence – that is to say, knowledge of a language. This means that our domain is centrally and inevitably mental; while we are necessarily interested in the behaviour of L2 learners on the one hand, and in the characteristics of individual languages on the other, what the SLA theorist in fact wants to explain is not why L2 learners say such-and-such, or why certain languages have such-and-such a construction, but rather why learners have the knowledge they do have of an L2, and of course how they come to have it.
TL;DR: This paper shows that the notion of polynomial time learnability of p-concepts and stochastic rules with fixed range size using the KL divergence is in fact equivalent to the same notion using the quadratic distance, and hence any of the distances considered in [6] and [18]: the quadRatic, variation, and Hellinger distances.
Abstract: We consider the problem of efficient learning of probabilistic concepts (p-concepts) and more generally stochastic rules in the sense defined by Kearns and Schapire [6] and by Yamanishi [18]. Their models extend the PAC-learning model of Valiant [16] to the learning scenario in which the target concept or function is stochastic rather than deterministic as in Valiant’s original model. In this paper, we consider the learnability of stochastic rules with respect to the classic ‘Kullback-Leibler divergence’ (KL divergence) as well as the quadratic distance as the distance measure between the rules. First, we show that the notion of polynomial time learnability of p-concepts and stochastic rules with fixed range size using the KL divergence is in fact equivalent to the same notion using the quadratic distance, and hence any of the distances considered in [6] and [18]: the quadratic, variation, and Hellinger distances. As a corollary, it follows that a wide range of classes of p-concepts which were shown to be polynomially learnable with respect to the quadratic distance in [6] are also learnable with respect to the KL divergence. The sample and time complexity of algorithms that would be obtained by the above general equivalence, however, are far from optimal. We present a polynomial learning algorithm with reasonable sample and time complexity for the important class of convex linear combinations of stochastic rules. We also develop a simple and versatile technique for obtaining sample complexity bounds for learning classes of stochastic rules with respect to the KL-divergence and quadratic distance, and apply them to produce bounds for the classes of probabilistic finite state acceptors (automata), probabilistic decision lists, and convex linear combinations. key words: PAC-learning, KL-divergence, quadratic-distance, stochastic rules, p-concepts
TL;DR: The product homomorphism method is developed, which gives polynomial PAC learning algorithms for a nonrecursive Horn clause with function-free ground background knowledge, if the background knowledge satisfies some structural properties.
TL;DR: Two approaches are proposed which take the specific distribution of the distribution into account and allow us to derive explicit bounds on the deviation of the empirical error from the real error of a learning algorithm.
Abstract: The information theoretical learnability of folding networks, a very successful approach capable of dealing with tree structured inputs, is examined. We find bounds on the VC, pseudo-, and fat shattering dimension of folding networks with various activation functions. As a consequence, valid generalization of folding networks can be guaranteed. However, distribution independent bounds on the generalization error cannot exist in principle. We propose two approaches which take the specific distribution into account and allow us to derive explicit bounds on the deviation of the empirical error from the real error of a learning algorithm. The first approach requires the probability of large trees to be limited a priori and the second approach deals with situations where the maximum input height in a concrete learning example is restricted.
TL;DR: In this article, the authors show how UML can be used to specify behavior-oriented multi-agent models, focusing on activity graphs and the representation of different forms of interactions in these graphs.
Abstract: Developing multi-agent simulations seems to be rather straight forward, as active entities in the original correspond to active agents in the model. Thus plausible behaviors can be produced rather easily. However, for real world applications they must satisfy some requirements concerning verification, validation and reproducibility. Using a standard framework for designing a multi-agent model one can gain further advantages like fast learnability, wide understandability and possible transfer.In this paper we show how UML can be used to specify behavior-oriented multi-agent models. Therefore we focus on activity graphs and the representation of different forms of interactions in these graphs.
Abstract: This paper reconsiders the value of frequency information in ELT, taking into account new evidence provided by corpora of native speakers' English (e. g. the British National Corpus) and evidence available through new dictionaries and grammars making use of such corpus information.Some examples are given, showing bow information about frequency in spoken and written English may cause re-appraisal of assumptions common in pedagogical grammar. It is argued that frequency as a principle for the selection and prioritising of language content has been neglected, and the availability of corpus-derived frequency information means that this neglect can now be rectified. However, frequency must be considered alongside other factors that have a bearing on sequencing in ELT materials, such as dispersion, coverage, learnability and communicative need. Also, it is important to bear in mind that findings based on corpora of nativespeaker English must be complemented by those based on corpora of learner English, and of the native language of the learners.
TL;DR: The paper describes a series of ongoing studies exarnining what Lado (1957) hypothesized to represent maximum difficulty in second language pronunciation, narnely, a phonemic split, and reports the results of an empirical study designed to test the explanatory adequacy of these principles.
Abstract: The research reported in this paper is intended as a contribution to the understanding of several wellknown problems relating to the leaming of phonemic contrasts in second language (L2) phonology. The paper describes a series of ongoing studies exarnining what Lado (1957) hypothesized to represent maximum difficulty in second language pronunciation, narnely, a phonemic split. This is the process involved when an L2 learner must split native language (NL) allophones into separate target language (TL) phonemes. Two core principles of phonological theory are described and evaluated for their relevante in explaining the series of well-defined, implicationally-related stages involved in a phonemic split. Finally, the paper reports the results of an empirical study designed to test the explanatory adequacy of these principles, and concludes with a discussion of the implications of these studies for second language phonology in general.
TL;DR: A case study from language disorders supports and extends the notion of AGR to include verb-Prepositional Phrase relations, where the child shows no agreement in Inflectional Phrase me can and Determiner Phrase them eyes.
TL;DR: The approach to learning introduced in the paper is believed to be significant in all problems where a nonlinear system has to be designed based on data, including direct inverse control and system identification.
Abstract: A notion of learnability is introduced, referred to as learnability with prior information (w.p.i.). This notion is weaker than the standard notion of probably approximately correct (PAC) learnability. A property called "dispersability" is introduced, and it is shown that dispersability plays a key role in the study of learnability w.p.i. Specifically, dispersability of a function class is always a sufficient condition for the function class to be learnable; moreover, in the case of concept classes, dispersability is also a necessary condition for learnability w.p.i. Thus in the case of learnability w.p.i., the dispersability property plays a role similar to the finite metric entropy condition in the case of PAC learnability with a fixed distribution. Next, the notion of learnability w.p.i. is extended to the distribution-free (d.f.) situation, and it is shown that a property called d.f. dispersability is always a sufficient condition for d.f. learnability w.p.i., and is also a necessary condition for d.f. learnability in the case of concept classes. The approach to learning introduced in the paper is believed to be significant in all problems where a nonlinear system has to be designed based on data. This includes direct inverse control and system identification.
TL;DR: A complexity measure for nominal functions is introduced, and upper bounds of the learnability of Augmented Naive Bayes in terms of that measure are proved.
Abstract: It is well-known that Naive Bayes can only represent linearly separable functions in binary domains. But the learnability of general Augmented Naive Bayes is open. Little work is done on the learnability of Bayesian networks in nominal domains, a general case of binary domains. This paper explores the learnability of Augmented Naive Bayes in nominal domains. We introduce a complexity measure for nominal functions, and prove upper bounds of the learnability of Augmented Naive Bayes in terms of that measure. Our results deepen our theoretical understanding of the learnability (and limitations) of Naive Bayes, Tree Augmented Naive Bayes, and general Augmented Naive Bayes with diier-ent levels of complexity.
TL;DR: For instance, this article proposed a knowledge-based approach for text-to-speech (TTS) synthesis, based on the academic tradition of generative linguistics, with the assumption that knowledge is essential to successful implementation.
Abstract: Speech synthesis is an emerging technology with a wide range of potential applications. In most such applications, the message to be spoken will be in the form of text input, so the main focus of development is text-to-speech (TTS) synthesis. Strongly influenced by the academic traditions of generative linguistics, early work on TTS systems took it as axiomatic that a knowledge-based approach was essential to successful implementation. Presumed theoretical constraints on the learnability of their native language by humans were applied by extension to machine learners to conclude the futility of trying to make useful ‘blank slate’ inferences about speech and language simply from exposure. This situation has changed dramatically in recent years with the easy availability of computers to act as machine learners and large databases to act as training resources. Many positive achievements in machine learning have comprehensively proven its usefulness in a range of natural language processing tasks, despite the negative assumptions of earlier times. Thus, contemporary speech synthesis relies heavily on data-driven techniques.
TL;DR: This paper addresses three key areas related to learnability: proposing a definition of learnability, showing where learnability and usability intersect, and providing a basis for learnability based on some attributes of human beings.
Abstract: Design of information used for technical communication of complex products should consider how learnable that information is, and strive to deliver materials that are inherently learnable.The speed of information interchange and the demands of the workplace and school curricula require increasingly minimalist approaches to the material that is made available. People are frustrated by long learning times, and new users of software tools demand rapid absorption of tool capabilities. In addition, many readers of technical information are people for whom English is not their native language.Methods and practices that worked in the period when people were willing to commit to hours of study to understand a topic, or days of practice to master a tool, no longer work in a world based on ?internet time.? To assist our understanding of these trends in learning, this paper addresses three key areas related to learnability: proposing a definition of learnability, showing where learnability and usability intersect, and providing a basis for learnability based on some attributes of human beings.
TL;DR: The model is implemented as a three-layer neural network that combines predictive perception, internal-state transitions and action selection into a loop which closes via the environment and considers hierarchical architectures consisting of several modules in one agent as well as groups of several agents, which are controlled by such networks.
TL;DR: A model of the origins of syllable systems that brings plausibility to the theory which claims that language learning, and in particular phonological acquisition, needs not innate linguistically specific information, but is rather made possible by the interaction between general motor, perceptual, cognitive and social constraints through a self-organizing process.
Abstract: This paper presents a model of the origins of syllable systems that brings plausibility to the theory which claims that language learning, and in particular phonological acquisition, needs not innate linguistically specific information, as believed by many researchers of the Chomskyan school, but is rather made possible by the interaction between general motor, perceptual, cognitive and social constraints through a self-organizing process. The strategy is to replace the question of acquisition in a larger and evolutionary (cultural) framework: the model addresses the question of the origins of syllable systems (syllables are the major phonological units in speech). It is based on the artificial life methodology of building a society of agents, endowed with motor, perceptual and cognitive apparati that are generic and realistic. We show that agents effectively build sound systems and how these sound systems relate to existing human sound systems. Results concerning the learnability of the produced sound systems by fresh/baby agents are detailed: the critical period effect and the artificial language effect can effectively be predicted by our model. The ability of children to learn sound systems is explained by the evolutionary history of these sound systems, which were precisely shaped so as to fit the ecological niche formed by the brains and bodies of these children, and not the other way around (as advocated by Chomskyan approaches to language).
TL;DR: This paper states generalizations of formal results on the relative value of labeled and unlabeled data to the realistic case where a labeler is not a foolproof oracle but is instead somewhat unreliable and error-prone and concludes with a call for a rich, powerful and practical computational theory of data acquisition and truthing.
Abstract: The creation of a pattern classifier requires choosing or creating a model, collecting training data and verifying or "truthing" this data, and then training and testing the classifier. In practice, individual steps in this sequence must be repeated a number of times before the classifier achieves acceptable performance. The majority of the research in computational learning theory addresses the issues associated with training the classifier (learnability, convergence times, generalization bounds, etc.). While there has been modest research effort on topics such as cost-based collection of data in the context of a particular classifier model, there remain numerous unsolved problems of practical importance associated with the collection and truthing of data. Many of these can be addressed with the formal methods of computational learning theory. A number of these issues, as well as new ones -- such as the identification of "hostile" contributors and their data -- are brought to light by the Open Mind Initiative, where data is openly contributed over the World Wide Web by non-experts of varying reliabilities. This paper states generalizations of formal results on the relative value of labeled and unlabeled data to the realistic case where a labeler is not a foolproof oracle but is instead somewhat unreliable and error-prone. It also summarizes formal results on strategies for presenting data to labelers of known reliability in order to obtain best estimates of model parameters. It concludes with a call for a rich, powerful and practical computational theory of data acquisition and truthing, built upon the concepts and techniques developed for studying general learning systems.
TL;DR: Using a variant of the agnostic learning model, this work improves on sufficient conditions for a class of real-valued functions to be agnostically learnable with a particular relative accuracy; in particular, it improves by a factor of two the scale at which scale-sensitive dimensions must be finite in order to imply learnability.
Abstract: We consider the problem of classification using a variant of the agnostic learning model in which the algorithm's hypothesis is evaluated by comparison with hypotheses that do not classify all possible instances. Such hypotheses are formalized as functions from the instance space X to {0, *, 1}, where * is interpreted as "don't know". We provide a characterization of the sets of {0, *, 1}-valued functions that are learnable in this setting. Using a similar analysis, we improve on sufficient conditions for a class of real-valued functions to be agnostically learnable with a particular relative accuracy; in particular, we improve by a factor of two the scale at which scale-sensitive dimensions must be finite in order to imply learnability.
TL;DR: Uniform solvability of collections of solvable identification problems is rather influenced by the description of the problems than by the particular problems themselves, and the influence of the hypothesis spaces on uniform learnability is analysed.
Abstract: A classical learning problem in Inductive Inference consists of identifying each function of a given class of recursive functions from a finite number of its output values. Uniform learning is concerned with the design of single programs solving infinitely many classical learning problems. For that purpose the program reads a description of an identification problem and is supposed to construct a technique for solving the particular problem.
As can be proved, uniform solvability of collections of solvable identification problems is rather influenced by the description of the problems than by the particular problems themselves. When prescribing a specific inference criterion (for example learning in the limit), a clever choice of descriptions allows uniform solvability of all solvable problems, whereas even the most simple classes of recursive functions are not uniformly learnable without restricting the set of possible descriptions. Furthermore the influence of the hypothesis spaces on uniform learnability is analysed.
TL;DR: This paper studies a new restriction of the PAC learning framework, in which an unsupervised learner that aims to fit an appropriate probability distribution to its own data, and gives an algorithm for learning monomials over input vectors generated by an unknown product distribution.
Abstract: In this paper we study a new restriction of the PAC learning framework, in whicheac hlab el class is handled by an unsupervised learner that aims to fit an appropriate probability distribution to its own data. A hypothesis is derived by choosing, for any unlabeled instance, the label whose distribution assigns it the higher likelihood.
The motivation for the new learning setting is that the general approach of fitting separate distributions to eachlab el class, is often used in practice for classification problems. The set of probability distributions that is obtained is more useful than a collection of decision boundaries. A question that arises, however, is whether it is ever more tractable (in terms of computational complexity or sample-size required) to find a simple decision boundary than to divide the problem up into separate unsupervised learning problems and find appropriate distributions.
Within the framework, we give algorithms for learning various simple geometric concept classes. In the boolean domain we show how to learn parity functions, and functions having a constant upper bound on the number of relevant attributes. These results distinguish the new setting from various other well-known restrictions of PAC-learning. We give an algorithm for learning monomials over input vectors generated by an unknown product distribution. The main open problem is whether monomials (or any other concept class) distinguish learnability in this framework from standard PAC-learnability.
TL;DR: It is shown that the learning power of finite and limit identification from good text examples coincide and, if learning from good informant examples is considered, limit identification is superior to finite identification in the class preserving as well as in theclass-comprising case.
Abstract: The present paper investigates identification of indexed families L of recursively enumerable languages from good examples. We distinguish class-preserving learning from good examples (the good examples have to be generated with respect to a hypothesis space having the same range as L) and class-comprising learning from the good examples (the good examples have to be selected with respect to a hypothesis space comprising the range of L). A learner is required to learn a target language on every finite superset of the good examples for it. If the learner's first and only conjecture is correct then the underlying learning model is referred to as finite identification from good examples and if the learner makes a finite number of incorrect conjectures before always outputting a correct one, the model is referred to as limit identification from good examples. In the context of class-preserving learning, it is shown that the learning power of finite and limit identification from good text examples coincide. When class comprising learning from good text examples is concerned, limit identification is strictly more powerful than finite learning. Furthermore, if learning from good informant examples is considered, limit identification is superior to finite identification in the class preserving as well as in the class-comprising case. Finally, we relate the models of learning from good examples to one another as well as to the standard learning models in the context of Gold-style language learnin
TL;DR: A model of language acquisition based on the Minimum Description Length principle and the role of meanings, as well as allowing signals of arbitrary length is presented, to establish properties of compositional language relative to a more sound model of linguistic generalisation.
Abstract: Explanations for the evolution of compositional and recursive syntax have previously attributed these phenomena to the genetic evolution of the language acquisition device. Recent work in the field of computational evolutionary linguistics suggests that syntactic structure can instead be explained in terms of the dynamics arising from the cultural evolution of language. We build on this previous work by presenting a model of language acquisition based on the Minimum Description Length principle. Our Monte Carlo simulations show that the relative cultural stability of compositional language versus noncompositional language is greatest under conditions specific to hominids: a complex meaning space structure. Introduction and Related Work Human language differs greatly from other natural communication systems. Our use of compositional and recursive syntax places us in a unique position: we can comprehend and produce an ostensibly infinite number of utterances. Why are we alone in this position? Human language is a result of three adaptive systems: learning, genetic evolution, and cultural evolution. Over the past half century cognitive scientists has addressed the problem of learning. The past ten years has seen a resurgence of interest in the evolutionary basis of language (Pinker & Bloom, 1990). Only recently has the cultural evolution of language been seriously analysed. Hare & Elman (1994) outlined perhaps the first iterated learning model. The iterated learning model seeks to model the evolution of language through generations of language users, solely on the basis of each agent observing the behaviour of the previous generation (Kirby, in press b). Recent demonstrations of the cultural evolution of compositionality and recursive syntax (Kirby, in press a; Batali in press) suggest that these properties of human language, traditionally attributed to genetic evolution, can in fact be explained as emergent properties arising from the dynamics of iterated learning. One criticism levelled at these models is that the learning bias of the individual agents is typically too strong – the observed behaviour is striking yet inevitable (Tonkes & Wiles, in press). Here, we consider compositional syntax – the property of human language whereby the meaning of a signal is some function of the meaning of its parts. We address the criticisms of bias strength by employing the Minimum Description Length (MDL) principle, which rests on a solid mathematical justification for induction. We demonstrate that the relative stability of compositional language, with respect to non-compositional language, is at a maximum under two conditions specific to hominids: (a) a complex meaning space, and (b), limited language exposure, a situation commonly referred to as the poverty of the stimulus. Gell-Mann (1992) was perhaps the first to suggest the relevance of Kolmogorov Complexity, which is closely related to MDL, to the study of language evolution. Our use of the MDL principle is similar to that of Teal et al (1999), who model change in signal structure using the iterated learning model. Our model extends this work by considering the role of meanings, as well as allowing signals of arbitrary length. The structure of this article as follows. First, we outline the MDL principle and introduce a novel hypothesis space. We then discuss issues of stability and learnability in the context of cultural evolution. Finally we illustrate the impact of meaning space complexity on the stability of compositional language. Our main goal is to establish properties of compositional language relative to a more sound model of linguistic generalisation. Hypothesis Selection by MDL Ranking potential hypotheses by minimum description length is a highly principled and very elegant approach to hypothesis selection (Li & Vitanyi, 1997). The MDL principle can be derived from Bayes’s Rule, and in short states that the best hypothesis for some observed data is the one that minimises the sum of (a) the encoding length of the hypothesis, and (b) the encoding length of the data, when represented in terms of the hypothesis. A tradeoff then exists between small hypotheses with a large data encoding length and large hypotheses with a small data encoding length. When the observed data contains no regularity, the best hypothesis is one that represents the data verbatim, as this minimises the data encoding length. However, when regularity does exist in the data, a smaller hypothesis is possible which describes the regularity, making it explicit, and as result the hypothesis describes more than just the observed data. For this reason, the cost of encoding the data increases. MDL tells us the ideal tradeoff between the length of the hypothesis encoding and the length of the data encoding described relative to the hypothesis. More formally, given some observed data D and a hypothesis space H the best hypothesis hMDL is defined as: hMDL min h H LC1 h LC2 D h (1) where LC1 h is the length in bits of the hypothesis h when using an optimal coding scheme over hypotheses. Similarly, LC2 D h is the length, in bits, of the encoding of the observed data using the hypothesis h. We use the MDL principle to find the most likely hypothesis for an observed set of meaning/signal pairs passed to an agent. When regularity exists in the observed language, the hypothesis will capture this regularity, when justified, and allow for generalisation beyond what was observed. By employing MDL we have a more theoretically solid justification for generalisation. The next section will clarify the MDL principle – we introduce the hypothesis space and coding schemes. The Hypothesis Space We introduce a novel model for mapping strings of symbols to meanings, which we term a Finite State Unification Transducer (FSUT). This model extends the scheme used by Teal et al (1999) to include meanings and variable length strings. Given some observed data, the hypothesis space consists of all FSUTs which are consistent with the observed data. Both compositional and non-compositional languages can be represented using the FSUT model. Throughout this paper, a meaning is defined as a set of features represented by a vector, with each feature taking a value. A meaning space profile describes the structure of a meaning space. For example, the meaning space profile 3 3 defines a meaning space with two dimensions, each dimension having three possible values. Signals are just strings of symbols (of arbitrary length) drawn from some alphabet Σ. A Finite State Unification Transducer is specified by a 6-tuple Q Σ Ω δ q0 qF where Q is the set of states used by the transducer, Σ is the alphabet from which symbols are drawn, and Ω is the meaning space profile which defines the structure of the meaning space. The transition function δ maps state/symbol pairs to a new state, along with the (possibly under specified) meaning corresponding to that part of the transducer. Two states, q0 and qF need to be specified, they are the initial and final state, respectively. Consider an agent A, which receives a set of meaning/signal pairs during acquisition. For example, a simple observed language might be the set: L1 2 1 cdef 2 2 cdgh 1 2 abgh Figure 1(a) depicts an FSUT which models L1. We term this transducer the prefix tree transducer – the observed language and only the observed language is represented by the prefix tree transducer. The power of the FSUT model only becomes apparent when we consider possible generalisations made by merging states and edges in the transducer. Figure 1(c) shows a compressed transducer. Here, the some of the states and meaning labels attached to the edges in the prefix tree transducer have been merged. There are two merge operations: 1. State Merge. Two states q1 and q2 can be merged if the transducer remains consistent. All edges that mention q1 or q2 now mention the new state. 2. Edge merge. Two edges e1 and e2 can be merged if they share the same source and target states and accept the same symbol. The result of merging the two edges is a new edge with a new meaning label. Meanings are merged by finding the intersection of the two component meanings. Those features which do not have values in common take the value ? – a wildcard which matches all values. As fragments of the meanings may be lost, a check for transducer consistency is also required. Figure 1(b) illustrates which state merge operations are applied to the prefix tree transducer in order to compress it. Figure 1 is simple example, as the resulting transducer does not generalize: only the observed meaning/signal pairs can be accepted or produced.
TL;DR: In this article, the equivalence relations between learnability, output-dissipativity and strict positive realness are discussed. But they do not consider the relation between the two concepts of learnability and dissipativity.
Abstract: This correspondence presents corrections and further comments to our previous paper titled 'Equivalence relations between learnability, output-dissipativity and strict positive realness' (Arimoto and Naniwa 2000). It corrects (1) the previous definition of input?output 'invertibility' by assuming an additional continuity property of the system inverse operator, redefines exactly (2) the 'output-dissipativity' by adding a condition of invertibility, and presents (3) a further comment on the proof of Lemma 1. A diagram is given in order to clearly show interrelationships between these concepts.
TL;DR: In this article, the authors studied the problem of learning an unknown function represented as an expression over a known finite monoid, and showed that for a group G, expressions over G are easily learnable if G is nilpotent and impossible to learn efficiently (under cryptographic assumptions) under G is nonsolvable.
Abstract: We study the problem of learning an unknown function represented as an expression over a known finite monoid. As in other areas of computational complexity where programs over algebras have been used, the goal is to relate the computational complexity of the learning problem with the algebraic complexity of the finite monoid. Indeed, our results indicate a close connection between both kinds of complexity. We focus on monoids which are either groups or aperiodic, and on the learning model of exact learning from queries. For a group G, we prove that expressions over G are easily learnable if G is nilpotent and impossible to learn efficiently (under cryptographic assumptions) if G is nonsolvable. We present some partial results for solvable groups, and point out a connection between their efficient learnability and the existence of lower bounds on their computational power in the program model. For aperiodic monoids, our results seem to indicate that the monoid class known as DA captures exactly learnability of expressions by polynomially many Evaluation queries.
TL;DR: This work focuses on monoids which are either groups or aperiodic, and on the learning model of exact learning from queries, and seems to indicate that the monoid class known as DA captures exactly learnability of expressions by polynomially many Evaluation queries.
Abstract: We study the problem of learning an unknown function represented as an expression over a known finite monoid. As in other areas of computational complexity where programs over algebras have been used, the goal is to relate the computational complexity of the learning problem with the algebraic complexity of the finite monoid. Indeed, our results indicate a close connection between both kinds of complexity. We focus on monoids which are either groups or aperiodic, and on the learning model of exact learning from queries. For a group G, we prove that expressions over G are easily learnable if G is nilpotent and impossible to learn efficiently (under cryptographic assumptions) if G is nonsolvable. We present some partial results for solvable groups, and point out a connection between their efficient learnability and the existence of lower bounds on their computational power in the program model. For aperiodic monoids, our results seem to indicate that the monoid class known as DA captures exactly learnability of expressions by polynomially many Evaluation queries.