TL;DR: It is demonstrated that the existence of a suitable data compression scheme is sufficient to ensure learnability and the introduced compression scheme provides a rigorous model for studying data compression in connection with machine learning.
Abstract: We explore the learnability of two-valued functions from samples using the paradigm of Data Compression. A first algorithm (compression) choses a small subset of the sample which is called the kernel. A second algorithm predicts future values of the function from the kernel, i.e. the algorithm acts as an hypothesis for the function to be learned. The second algorithm must be able to reconstruct the correct function values when given a point of the original sample. We demonstrate that the existence of a suitable data compression scheme is sufficient to ensure learnability. We express the probability that the hypothesis predicts the function correctly on a random sample point as a function of the sample and kernel sizes. No assumptions are made on the probability distributions according to which the sample points are generated. This approach provides an alternative to that of [BEHW86], which uses the Vapnik-Chervonenkis dimension to classify learnable geometric concepts. Our bounds are derived directly from the kernel size of the algorithms rather than from the Vapnik-Chervonenkis dimension of the hypothesis class. The proofs are simpler and the introduced compression scheme provides a rigorous model for studying data compression in connection with machine learning.
TL;DR: It is argued that L2 learners have access to semantic universals and how these interact with movement at the syntax-semantics interface even before full acquisition of target form-to-meaning mappings.
Abstract: In this experimental study, we focus on the following semantic universal: if a habitual clause reading, then generic pronominal subject; if an episodic clause reading, then specific pronominal subject. We argue that although this set of two conditionals is a universal property of all natural languages, English-speaking second-language (L2) learners of Spanish must access it through the mediation of aspectual morphology. Because habitual and episodic readings are encoded by different functional morphemes in English and Spanish, the L2 acquisition of this semantic universal necessitates a significant restructuring of the native form-to-meaning mappings. Even more problematic from a learnability point of view is the negative constraint on generic pronominal subjects in sentences with the Preterite. We compare the acquisition of the universal computational mechanism and the negative constraint with acquisition of the prototypical habitual and episodic meanings of the Spanish Imperfect and Preterite with dynam...
TL;DR: It is made the conjecture that classes of choice functions that represent a genuine aggregation of individual choices in a large society are never learnable, which would imply that the rationality hypothesis in some economic situations is wrong.
TL;DR: This paper focused on Korean English as a foreign language (EFL) learners' knowledge of the locative alternation (e.g., John loaded hay onto the wagon/John loaded the wagon with hay).
Abstract: The present research focuses on Korean English as a foreign language (EFL) learners’ knowledge of the locative alternation (e.g., John loaded hay onto the wagon/John loaded the wagon with hay) and ...
TL;DR: A model for interaction with spoken language interfaces applied to heterogeneous tasks for service robots is proposed, based on the idea of using a family of lifelike characters to facilitate learning and transfer between interfaces.
Abstract: In a future scenario where many devices can be controlled using the voice, easy and intuitive access will be crucial for avoiding cognitive overload when users are faced with many different systems and interaction models. We propose a model for interaction with spoken language interfaces applied to heterogeneous tasks for service robots, based on the idea of using a family of lifelike characters. We argue that we can signal important features of the speech interface by using certain visual cues. The aim is to facilitate learning and transfer between interfaces. We discuss challenges for dialogue design affecting learnability in the light of the speech interface constructed for our full-scale robot prototype CERO.
TL;DR: The authors investigated whether the acquisition of such constraints in English locatives by Korean speakers, and whether the first language (L1) influences the second language acquisition of locative alternations.
Abstract: The present research focuses on Korean English as a foreign language (EFL) learners’ knowledge of the locative alternation (e.g., John loaded hay onto the wagon/John loaded the wagon with hay) and its relationship to theories of language-particular and language-universal properties. Korean, the native language of the participants, has a locative alternation resembling that of English. However, although Korean and English are similar in terms of broad-range constraints, they are dissimilar in terms of narrow-range constraints for locative alternations. This study investigates whether the acquisition of such constraints in English locatives by Korean speakers, and whether the first language (L1) influences the second language (L2) acquisition of locative alternations. Two instruments are used in the experiment: a forced-choice picture-description task and a forced-choice sentence selection task. The study investigates an experimental group of Korean learners of English and a control group of native speakers of English. The results are discussed with reference to universality of linking, to the transfer of argument structure and to Pinker’s learnability theory. The primary results are: • The Korean learners of English had acquired the constructional meaning of the locative construction (which is related to Pinker’s (1989) concept of broad-range rules and broad conflation classes), a property claimed to be universal. • They had not achieved native-speaker knowledge of language-particular properties - which narrow conflation class verbs belong to - so that they did not reject ungrammatical sentences; and • Significant L1 transfer effects were not found.
TL;DR: In this paper, a process and tool is presented to compare given competing algorithms to a derived reference, such as a baseline or benchmark, and a result confidence as to the suitability of the competing algorithm to a given task is generated.
Abstract: In predictive data mining, a process and tool presents a method to compare given competing algorithms to a derived reference, such as a baseline or benchmark. A result confidence as to the suitability of the competing algorithm to a given task is generated. In an exemplary embodiment, a randomized feature acting, simple, algorithm is used to generate the baseline. In an alternative embodiment, the process and tool is used to determine learnability of the given task. A mechanism to account for overfitting of data is described.
TL;DR: It is proved that the computational complexity of the learnability of these formulas is completely determined by a simple algebraic property of the basis of relations: their clone of polymorphisms.
TL;DR: A generic framework or “reference model” called KBT-MM (knowledge based tutor meta-model) for knowledge based tutor authoring tools is described, which articulates a minimal but necessary set of features for knowledge Based Tutor Authoring tools that aim for scope, depth, learnability, and productivity.
Abstract: While intelligent tutoring systems (ITSs), also called knowledge based tutors, are becoming more common and proving to be increasingly effective, each one must still be built from scratch at a significant cost. This paper discusses a number of design issues and design tradeoffs that are involved in building ITS authoring tools, and discuss knowledge acquisition and representation “lesson learned” in our work. A generic framework or “reference model” called KBT-MM (knowledge based tutor meta-model) for knowledge based tutor authoring tools is described. The reference model articulates a minimal but necessary set of features for knowledge based authoring tools that aim for scope, depth, learnability, and productivity.
TL;DR: A neural networks based approach to relational learning, where the neural net that is learned can actually represent a combination of aggregate functions that summarize that set of related tuples.
Abstract: Relational learners need to be able to handle the information contained in a set of related tuples. Most current relational learners are biased either towards the use of aggregate functions that summarize that set, or towards checking the existence of specic kinds of elements in that set. Learning patterns that contain a combination of both is a challenging task. In this paper we introduce a neural networks based approach to relational learning, where the neural net that is learned can actually represent such a combination. This capacity is illustrated on toy problems, but several questions are open with respect to learnability of more complicated concepts.
TL;DR: The authors compared the performance of the two learning algorithms for the metrical stress system of Classical Latin and found that the GLA may be a better model of acquisition than EDCD. But they also found that EDCD cannot learn this system from overt forms only, and that GLA can.
Abstract: Optimality-Theoretic learning algorithms are only guaranteed to be successful if the data fed to them contain full structural descriptions of the surface forms, i.e. descriptions that include hidden structure like metrical feet. This is not realistic as a model of acquisition, because children are only exposed to overt forms, e.g. unstructured strings of syllables. Optimality-Theoretic learning algorithms that learn solely from overt forms turn out to sometimes succeed and sometimes fail (Tesar & Smolensky 2000). This possibility of failure is a property of both on-line learning algorithms that have been proposed for OT, namely Error Driven Constraint Demotion (EDCD; Tesar 1995) and the Gradual Learning Algorithm (GLA; Boersma 1997). The possibility of failure is not necessarily bad: one would want an algorithm to fail for languages that do not exist, and to succeed for languages that do exist. Latin exists (or existed). This paper compares the performance of the two learning algorithms for the metrical stress system of Classical Latin. It turns out that EDCD cannot learn this system from overt forms only, and that the GLA can. This suggests that the GLA may be a better model of acquisition than EDCD. The results also provide evidence in the discussion in the literature about what is the correct linguistic analysis of Latin stress: if overt forms contain main stress only, the GLA makes the child posit an analysis that makes use of uneven trochees (like the analysis by Jacobs 2000) rather than strictly bimoraic trochees (like the analysis by Mester 1994 and Hayes 1995).
TL;DR: In this article, the learnability of HEFS in the query learning model using equivalence queries and additional queries such as membership, predicate membership, entailment membership, and dependency queries is investigated.
TL;DR: It turns out that EDCD cannot learn this system from overt forms only, and that theGLA can, which suggests that the GLA may be a better model of acquisition than EDCD.
Abstract: Optimality-Theoretic learning algorithms are only guaranteed to be successful if the data fed to them contain full structural descriptions of the surface forms, i.e. descriptions that include hidden structure like metrical feet. This is not realistic as a model of acquisition, because children are only exposed to overt forms, e.g. unstructured strings of syllables. Optimality-Theoretic learning algorithms that learn solely from overt forms turn out to sometimes succeed and sometimes fail (Tesar & Smolensky 2000). This possibility of failure is a property of both on-line learning algorithms that have been proposed for OT, namely Error Driven Constraint Demotion (EDCD; Tesar 1995) and the Gradual Learning Algorithm (GLA; Boersma 1997). The possibility of failure is not necessarily bad: one would want an algorithm to fail for languages that do not exist, and to succeed for languages that do exist. Latin exists (or existed). This paper compares the performance of the two learning algorithms for the metrical stress system of Classical Latin. It turns out that EDCD cannot learn this system from overt forms only, and that the GLA can. This suggests that the GLA may be a better model of acquisition than EDCD. The results also provide evidence in the discussion in the literature about what is the correct linguistic analysis of Latin stress: if overt forms contain main stress only, the GLA makes the child posit an analysis that makes use of uneven trochees (like the analysis by Jacobs 2000) rather than strictly bimoraic trochees (like the analysis by Mester 1994 and Hayes 1995).
TL;DR: This paper is a report on the initial findings of a study conducted in the project FunTain with the main purpose to find general guidelines for edutainment games, in order to guide designers of education games.
Abstract: This paper is a report on the initial findings of a study conducted in the project FunTain with the main purpose to find general guidelines for edutainment games, in order to guide designers of suc ...
TL;DR: The notion that language change consists purely of constraint reranking is incoherent, because languages must also differ lexically and cases of output-output correspondence in the literature actually reflect diachronic analogy.
Abstract: The notion that language change consists purely of constraint reranking is incoherent. Languages must also differ lexically. Cases of output-output correspondence in the literature actually reflect diachronic analogy, which is not a property of synchronic grammars, but is rather the result of lexical restructuring during acquisition.
TL;DR: The results suggest the learnability of the utility function classes defined by changing the user power (adjusted parameter) for each user's utility function.
Abstract: In this paper we use statistical learning theory to evaluate the performance of game theoretic power control algorithms for wireless data in arbitrary channels, i.e., no presumed channel model is required. To show the validity of statistical learning theory in this context, we studied a flat fading channel, and more specifically, we simulated the case of Rayleigh flat fading channel. With the help of a relatively small number of training samples, the results suggest the learnability of the utility function classes defined by changing the user power (adjusted parameter) for each user's utility function.
TL;DR: The erasing language generated by a pattern p is the set of all strings that can be obtained by substituting (possibly empty) strings of constant symbols for the variables in p.
Abstract: A pattern is a finite string of constant and variable symbols The erasing language generated by a pattern p is the set of all strings that can be obtained by substituting (possibly empty) strings of constant symbols for the variables in p
TL;DR: This dissertation presents a Mbultiple Tbhree-Dbescriptor Rbepresentation (MTDR) algorithm, a novel algorithm for learning concept drift especially built for tracking the dynamics of multiple target concepts in the information filtering domain.
Abstract: Tracking the evolution of user interests is a problem instance of concept drift learning. Keeping track of multiple interest categories is a natural phenomenon as well as an interesting tracking problem because interests can emerge and diminish at different time frames. The first part of this dissertation presents a Mbultiple Tbhree-Dbescriptor Rbepresentation (MTDR) algorithm, a novel algorithm for learning concept drift especially built for tracking the dynamics of multiple target concepts in the information filtering domain. The learning process of the algorithm combines the long-term and short-term interest (concept) models in an attempt to benefit from the strength of both models. The MTDR algorithm improves over existing concept drift learning algorithms in the domain.
Being able to track multiple target concepts with a few examples poses an even more important and challenging problem because casual users tend to be reluctant to provide the examples needed, and learning from a few labeled data is generally difficult. The second part presents a computational Fbramework for Ebxtending Ibncomplete Lbabeled Dbata Sbtream (FEILDS). The system modularly extends the capability of an existing concept drift learner in dealing with incomplete labeled data stream. It expands the learner's original input stream with relevant unlabeled data; the process generates a new stream with improved learnability. FEILDS employs a concept formation system for organizing its input stream into a concept (cluster) hierarchy. The system uses the concept and cluster hierarchy to identify the instance's concept and unlabeled data relevant to a concept. It also adopts the persistence assumption in temporal reasoning for inferring the relevance of concepts. Empirical evaluation indicates that FEILDS is able to improve the performance of existing learners particularly when learning from a stream with a few labeled data.
Lastly, a new concept formation algorithm, one of the key components in the FEILDS architecture, is presented. The main idea is to discover intrinsic hierarchical structures regardless of the class distribution and the shape of the input stream. Experimental evaluation shows that the algorithm is relatively robust to input ordering, consistently producing a hierarchy structure of high quality.
TL;DR: The case that an iterative learner has to learn from fat texts (fat informants), only is considered, and it is guaranteed that relevant information is, in principle, accessible at any time in the learning process.
TL;DR: This paper introduces advanced elementary formal systems (AEFSs), i.e., elementary formal system which allow for the use of a certain kind of negation, which is nonmonotonic, in essence, and which is conceptually close to negation as failure.
TL;DR: This paper considers test feature classifiers recently introduced by Lashkia and Aleshin and shows that they meet all essential requirements to be of practical use in medical decision making, which are: ability to handle irrelevant attributes, high expressive power, high recognition ability, and ability to generate decisions by a set of rules.
TL;DR: It is demonstrated in this paper how one can effectively use cognitiveWalkthroughs and code walkthroughs for empirically modeling developers' characteristics and personae.
Abstract: This is the third in a series of reports about usability and learnability problems of integrated development environments (IDE). The first report used the method of ethnographic interviews, and the second report the methods of heuristic and psychometric evaluation to study problems that developers face in using IDEs. The present study extends previous work by applying the method of direct behavioral observation of more versus less experienced users of the same IDE for C++. We demonstrate in this paper how one can effectively use cognitive walkthroughs and code walkthroughs for empirically modeling developers' characteristics and personae.
TL;DR: A straight-forward method for measuring ZPD-learning that focuses on the ongoing amount of hints or help that learners need as they solve problems and is used for two purposes in evaluating/diagnosing student learning: dynamic evaluation that enables adaptive instruction, and formative evaluation that inform future ACAL design.
Abstract: Toward an Operational Definition of the Zone of Proximal Development for Adaptive Instructional Software Tom Murray 1 & Ivon Arroyo 2 School of Cognitive Science, Hampshire College, tmurray@hampshire.edu Computer Science, University of Massachusetts, ivon@cs.umass.edu Introduction Measuring and comparing student learning in adaptive computer assisted learning (ACAL) systems is problematic because the system is trying to both model and change the user, and in this sense is chasing a moving target. Process- oriented metrics for measuring learning, such as the zone of proximal development (ZPD) can be more robust in such situations. Though the concept of the ZPD is often invoked in the context of instructional systems, it has not been operationalized in a manner that allows it to be used in ACAL. We propose a straight-forward method for measuring ZPD-learning that focuses on the ongoing amount of hints or help that learners need as they solve problems. The ZPD is commonly used to articulate apprenticeship learning approaches, scaffolding and fading (note: references removed from this extended abstract, available from the authors), and authentic (situated) learning tasks. The ZPD describes a zone within which tasks are too difficult to accomplish without assistance, but which can be accomplished with some help. The ZPD in terms of a student's readiness to learn a new skill in terms of the assessment of learning potential or learnability . These descriptions of the ZPD are useful for framing certain educational issues, but they are not defined in an operational way. We argue that keeping the learner within this optimal zone could be described in several compatible ways: s Putting a greater emphasis on monitoring learning processes variables and maintaining efficient as well as effective learning; s Cognitively there is a goal to presenting material that is neither too easy nor too difficult; s Affectively there is a goal of avoiding the extremes of boredom and confusion (being overwhelmed); s This can also be seen as maintaining a constant level of challenge (and support), or a constant rate of learning. Figure 1: ZPD Illustration Figure 1 illustrates our interpretation of the ZPD. It shows a state space diagram illustrating a student's trajectory through time in the space of tutorial content difficulty versus the student's evolving skill level. The goal is to give content that match the student's evolving skill level by providing just the right amount of challenge. An operational definition of the ZPD We define a problem equivalency set (PES) as the set of all of the problems that address the same topic(s) at approximately the same level of difficulty. We define the specific ZPD (SZPD) to have three parameters that could be set in an ACAL: H, the goal number of hints in a PES; P the minimum number of problems in a PES; and dH, the acceptable variation in H. Thus the goal is to keep the number of hints in a PES between H+dH and H-dH (while guaranteeing that the student sees at least P problems). The scheme has the following properties, it is: non-monotonic (allows for learner forgetting and unsystematic error in student model); forgiving (recent behavior has more weight); accommodates to different learning styles (e.g. gradual vs. normative vs. insightful learning); tolerant to slips & guesses (one behavior can’t make a big difference). The SZPD parameters H, P, and dH in each tutoring system (or content module) are adjusted by a content expert to account for task difficulty calibration, and the teacher's pedagogical style. We are using this method for two purposes in evaluating/diagnosing student learning: dynamic evaluation that enables adaptive instruction, and formative evaluation that inform future ACAL design. In our post-hoc analysis of data from Animalwatch arithmetic and fractions tutor (three studies over three years on a total of 350 subjects) we have used our ZPD approach to monitor hint flow in the analysis of the pedagogical model, the student model, and the content model of the tutor. To evaluate the pedagogical model we analyzed trends in student model proficiency levels vs. problem difficulty, to evaluate content accuracy we analyzed assigned difficulty of a problem vs. average number of mistakes students made on the problem; to evaluate student model accuracy we compare trends in problem time, average mistakes, and student mastery over the Nth problem seen. We also look at trends in hints vs. problem solving time to assess whether students are authentically engaged in problem solving. Preliminary data, graphs, and analysis is available in other papers by the authors, and the analysis is still in progress.
TL;DR: An integrative approach exploring the relationship between learnability and word order is presented, incorporating syntactic theory, corpus analyses and computational modelling, and concludes that inconsistencies may be preserved in the language due to the interaction between several syntactic structures.
Abstract: Languages often demonstrate word order inconsistencies, and such inconsistencies ought to make languages harder to acquire. We present an integrative approach exploring the relationship between learnability and word order, incorporating syntactic theory, corpus analyses and computational modelling. We focus on comparisons between English and German, and conclude that inconsistencies may be preserved in the language due to the interaction between several syntactic structures.
TL;DR: 'Systemic productivity' obtains in pluridimensional paradigms and is explained without syntactic features (they are artifactual and their learnability is questionable).
Abstract: Putting a priority on the study of linguistic dynamics i) eases their understanding with respect to each other and ii) 'explains' grammatical properties as a side effect. No disjunction between a grammar (today) and the dynamics (to morrow ?). The levers are analogy and proximality : some accesses are proximal (= cheaper), others are more costly. A precise and operable model of linguistic productivity consists of : - a statics, the 'plexus', inscriptions that are meshed, exemplarist, and contextual. Among 'empty' terms (free of properties), 'copositionings' take place. - dynamics which are cognitively founded. The syntactic analysis of an uterance is redefined as structure mappings piled up onto one another. 'Systemic productivity' obtains in pluridimensional paradigms and is explained without syntactic features (they are artifactual and their learnability is questionable). A model of acquisition predicts the sigmoid curves which are general empiry in acquisition.
TL;DR: Two paradigms of PAC learning are introduced, namely absolute PAC learning, which is independent of the representation of the class of hypotheses, and PAC learning wrt the indexes, which heavily depends on such representations, and non-computable learnability in both contexts are characterized.
TL;DR: This work characterizations show the equivalence between the learnability of a concept class C using queries and the existence of a good query for any subset H of C which is guaranteed to reject a certain fraction of candidate concepts in H regardless of the answer.