TL;DR: A notable lack of agreement among informed scholars on the classification of the Uto-Aztecan languages can be found in this paper, where the problem revolves around the family-tree approach versus the wave or mesh approach (see Bloomfield 1933:311-18 and Swadesh 1959).
Abstract: 0. Introduction. There has been a notable lack of agreement among informed scholars on the classification of the Uto-Aztecan languages. The problem revolves around the family-tree approach versus the wave or mesh approach (see Bloomfield 1933:311-18 and Swadesh 1959). The family-tree approach assumes sudden splits within a dialect-free parent, while the wave approach assumes a dialect continuum which dissolves into distinct languages and in which the newly budded languages reflect the earlier dialect interrelationships. The wave principle operated to a greater extent in Uto-Aztecan than some other families (e.g., Indo-European). The vexing and interesting problems for Uto-Aztecan are two: first, to what extent did the wave principle operate; and second, how are we to describe or represent the relationships that are difficult or impossible to represent by the traditional family-tree classification? The Uto-Aztecan family consists of about thirty languages, located in two main geographic areas: the northern one in southern California, the Great Basin, and nearby areas; and the southern one stretching from southern Arizona, through northwest Mexico, into central Mexico and beyond (see fig. 1). Those favoring greater importance for the family-tree approach recognize three branches: Shoshonean, Sonoran, and Nahuatl or Aztecan. A variant of this approach would group Sonoran and Aztecan into a single branch called Southern Uto-Aztecan (SUA), with Shoshonean then renamed Northern Uto-Aztecan (NUA). Those favoring greater importance for the wave approach view Uto-Aztecan as being composed of eight or more independent branches. The so-called Shoshonean, then, is viewed as consisting of four branches, Sonoran of three or more (the particular number varying somewhat among different investigators), with general but not universal agreement by both groups that Aztecan forms an independent branch.
TL;DR: The surface nubbliness of Middle English and early Middle English writing systems has been studied in this article, where it is shown that when complex systems are assigned geographical positions close to each other, and indeed close to where simpler, more economical systems are localized, a picture emerges that can appear haphazard and unlike the dialect continuum we would expect.
Abstract: There are two main strands to this paper. The first is that in Middle English – and early Middle English especially – there are many writing systems that are so complex as to seem disorderly. But a sympathetic and careful interpretation of these systems shows sophisticated underlying order. The second strand is related to the first: early Middle English writing systems are local and may be represented on maps. When complex systems are assigned geographical positions close to each other – and indeed close to where simpler, more economical systems are localized – a picture emerges that can appear haphazard and unlike the dialect continuum we would expect. We refer to this phenomenon as surface nubbliness. This masks the underlying regional dialect continuum we believe to have been present in the spoken language. But knowledge of how these written systems mapped symbol to sound may enable us to uncover a continuum at the level of sound substance.
TL;DR: NoAH’s Corpus of Swiss German Dialects, consisting of various text genres, manually annotated with Part-of-Speech tags, is compiled to serve as a stepping stone to automatically process the dialects.
Abstract: Swiss German is a dialect continuum whose dialects are very different from Standard German, the official language of the German part of Switzerland. However, dealing with Swiss German in natural language processing, usually the detour through Standard German is taken. As writing in Swiss German has become more and more popular in recent years, we would like to provide data to serve as a stepping stone to automatically process the dialects. We compiled NOAH’s Corpus of Swiss German Dialects consisting of various text genres, manually annotated with Part-ofSpeech tags. Furthermore, we applied this corpus as training set to a statistical Part-of-Speech tagger and achieved an accuracy of 90.62%.
TL;DR: This work investigates the development pattern of Chinese dialects using a neighbour-net approach, which is an unprejudiced technique for representing object relationships, and results are consistent with a dialect continuum shaped by counterbalanced effects of homogenizing diglossia and borrowing versus differentiating spread of speech communities.
Abstract: As with species studied by evolutionary biologists, languages are evolving entities. They can evolve in tree-like patterns, possibly blurred by borrowing, but they can also develop in non-tree-like schemes. For instance, diglossia, as in the case of Chinese, can counterbalance the hierarchical pattern expected from differentiation by internal change associated with isolation by distance of speech communities. Using two lexical datasets, either the basic lexicon supposedly more immune to borrowing or a representative sample of the whole lexicon, we investigate the development pattern of Chinese dialects using a neighbour-net approach, which is an unprejudiced technique for representing object relationships. The resulting graphs are consistent with a dialect continuum shaped by counterbalanced effects of homogenizing diglossia and borrowing versus differentiating spread of speech communities. Historical events and linguistic claims can be mapped on these graphs.
TL;DR: In this article, the authors present a synchronic description of the Taleshi language spoken in northwest Iran and compare the basic phonological, morphological and syntactic structure of three dialects spoken in Iran: Anbarani (northern), Asalemi (central) and Masali (southern).
Abstract: This work presents a synchronic description of the Taleshi language spoken in northwest Iran. Its purpose is to provide a comparative study of the basic phonological, morphological and syntactic structure of three dialects spoken in Iran: Anbarani (northern), Asalemi (central) and Masali (southern). In addition, the sociolinguistic situation of the dialects is explored, along with some key elements of narrative discourse structure.To date only individual dialects of Iranian Taleshi have been described, mostly at the level of a grammatical sketch. This study, by comparing key representative speech varieties of each main dialect area, provides an overview of the whole dialect continuum, and is thereby able to show how the language changes from north to south. This variation has arisen partly as a result of language contact: the Taleshi language area is surrounded by other languages, including South Azerbaijani (Turkic), and Tati, Gilaki and Persian (all Western Iranian). Language shift to Persian is also occurring, and many Talesh no longer transmit their mother tongue to the next generation.The data for the study is drawn from fieldwork carried out in Iran during 2006 and 2007. This fieldwork included the elicitation of word and sentence lists, and the recording, transcription and translation of narrative texts in each dialect area. Further to these, a short film (The Pear Film) was used to elicit spontaneous narrative texts in nine locations along the dialect continuum; we therefore include some wider comment on other dialects of Iranian Taleshi.