TL;DR: The TRACE model, described in detail elsewhere, deals with short segments of real speech, and suggests a mechanism for coping with the fact that the cues to the identity of phonemes vary as a function of context.
TL;DR: This work shows the need to explicitly account for underspecification in modeling pipelines that are intended for real-world deployment in any domain, and shows that this problem appears in a wide variety of practical ML pipelines.
Abstract: ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification as a key reason for these failures. An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain. Underspecification is common in modern ML pipelines, such as those based on deep learning. Predictors returned by underspecified pipelines are often treated as equivalent based on their training domain performance, but we show here that such predictors can behave very differently in deployment domains. This ambiguity can lead to instability and poor model behavior in practice, and is a distinct failure mode from previously identified issues arising from structural mismatch between training and deployment domains. We show that this problem appears in a wide variety of practical ML pipelines, using examples from computer vision, medical imaging, natural language processing, clinical risk prediction based on electronic health records, and medical genomics. Our results show the need to explicitly account for underspecification in modeling pipelines that are intended for real-world deployment in any domain.
TL;DR: It is argued that the total body of evidence supports a model in which phonetic and cognitive pressures incrementally affect the lexicon, and phonotactic constraints are abstractions over the Lexicon of phonological forms.
Abstract: It has long been known that verbal roots containing homorganic consonant pairs are rare in Arabic, motivating the existence of an OCP-Place constraint (Obligatory Contour Principle on place of articulation) in the phonological grammar. We explore this constraint using an on-line lexicon of Arabic roots. The strength of the constraint is quantified by the ratio of the observed number of examples of each consonant pair to the number that would be statistically expected under random combination of phonemes. We show that the strength of the effect over all pairs is a gradient function of the similarity of the consonants in the pair. A similarity metric based on natural classes is developed, which solves the formal difficulties of contrastive underspecification theory while preserving the insight that contrastiveness plays an important role in perceived similarity. This metric is applied in an explicit model of the gradient OCP constraint, which achieves a better fit to the regularities and sub-regularities of the Arabic verbal lexicon than any prior approach. Lastly, we review evidence for the psychological reality of the constraint, for its existence in related forms in other languages, and for its cognitive/phonetic foundations in the speech processing system. We argue that the total body of evidence supports a model in which phonetic and cognitive pressures incrementally affect the lexicon, and phonotactic constraints are abstractions over the lexicon of phonological forms.
TL;DR: The authors explored four variables that contribute to this vulnerability to different extents depending on the nature of the interface: underspecification, cross-linguistic influence, quantity and quality of the input, and processing limitations.
Abstract: This article deals with the interface between syntax and discoursepragmatics/semantics in bilingual speakers. Linguistic phenomena at the interface have been shown to be especially vulnerable in both child and adult bilinguals; here we explore four variables that contribute to this vulnerability to different extents depending on the nature of the interface: underspecification, cross-linguistic influence, quantity and quality of the input, and processing limitations. We investigate the role played by the aforementioned variables in two recently completed studies. One compares the performance of English– Italian and Spanish–Italian bilingual children, monolingual English- and Italian-speaking children and adults on forced-choice grammaticality tasks on the distribution of overt and null subject pronouns in Italian and in English. The second explores bilingual and monolingual speakers’ sensitivity to the presence of definite articles in specific and generic plural noun phrases in Italian and in English. We show that over and above structural overlap, other factors must be included to account for differences in the behavioural data in the two tasks and in different populations of bilinguals and monolinguals. We argue that processing factors play a non-trivial role in the difficulty encountered by bilinguals in coordinating syntax with contextual discourse-pragmatic information, regardless of the absence or presence of partial structural overlap. In the case of the internal coordination between syntax and semantics, processing factors may be less likely to affect bilinguals’ performance, while the extent of structural overlap and the associated internal formal features seem to play a more important role.
TL;DR: An evaluation metric in Universal Grammar provides a means of selecting between possible grammars for a particular language and the general idea of underspecification has always been a part of any theory of phonology that includes such an evaluation metric.
Abstract: An evaluation metric in Universal Grammar provides a means of selecting between possible grammars for a particular language. The evaluation metric as conceived in Chomsky & Halle (1968; henceforth SPE) prefers the grammar in which only the idiosyncratic properties are lexically listed and predictable properties are derived. The essence of underspecification theory is to supply such predictable distinctive features or feature specifications by rule. Viewed in this way, the general idea of underspecification has always been a part of any theory of phonology that includes such an evaluation metric.