TL;DR: Hamers and Blanc as discussed by the authors presented state-of-the-art knowledge about languages in contact from individual bilingualism (or bilinguality) to societal bilingualism, and analyzed bilingualism at individual, interpersonal, and societal levels.
Abstract: This updated and revised edition of Hamers and Blanc's successful textbook presents state-of-the-art knowledge about languages in contact from individual bilingualism (or bilinguality) to societal bilingualism. It is both multi- and interdisciplinary in approach, and analyses bilingualism at individual, interpersonal, and societal levels. Linguistic, cognitive and sociocultural aspects of bilingual development are explored, as are problems such as bilingual memory and polyglot aphasia. Hamers and Blanc analyse the relationship between culture, identity, and language behaviour in multicultural settings, as well as the communication strategies in interpersonal and intergroup relations. They also propose theoretical models of language processing and development, which are then applied to bilingual behaviour. Other topics reviewed include language shift, pidgins and creoles, language planning and bilingual education. This book will be invaluable to students, teachers and scholars interested in languages in contact in a range of disciplines including psycholinguistics, linguistics, the social sciences, education and language planning.
TL;DR: The authors used word embeddings for more than 100 languages using their corresponding Wikipedias and found their performance to be competitive with near state-of-the-art methods in English, Danish and Swedish.
Abstract: Distributed word representations (word embeddings) have recently contributed to competitive performance in language modeling and several NLP tasks. In this work, we train word embeddings for more than 100 languages using their corresponding Wikipedias. We quantitatively demonstrate the utility of our word embeddings by using them as the sole features for training a part of speech tagger for a subset of these languages. We find their performance to be competitive with near state-of-art methods in English, Danish and Swedish. Moreover, we investigate the semantic features captured by these embeddings through the proximity of word groupings. We will release these embeddings publicly to help researchers in the development and enhancement of multilingual applications.
TL;DR: This paper focuses on the design choices in Polyglot that are important for making the framework usable and highly extensible.
Abstract: Polyglot is an extensible compiler framework that supports the easy creation of compilers for languages similar to Java, while avoiding code duplication. The Polyglot framework is useful for domain-specific languages, exploration of language design, and for simplified versions of Java for pedagogical use. We have used Polyglot to implement several major and minor modifications to Java; the cost of implementing language extensions scales well with the degree to which the language differs from Java. This paper focuses on the design choices in Polyglot that are important for making the framework usable and highly extensible. Polyglot source code is available.
TL;DR: The authors used word embeddings for more than 100 languages using their corresponding Wikipedias and found their performance to be competitive with near state-of-the-art methods in English, Danish and Swedish.
Abstract: Distributed word representations (word embeddings) have recently contributed to competitive performance in language modeling and several NLP tasks. In this work, we train word embeddings for more than 100 languages using their corresponding Wikipedias. We quantitatively demonstrate the utility of our word embeddings by using them as the sole features for training a part of speech tagger for a subset of these languages. We find their performance to be competitive with near state-of-art methods in English, Danish and Swedish. Moreover, we investigate the semantic features captured by these embeddings through the proximity of word groupings. We will release these embeddings publicly to help researchers in the development and enhancement of multilingual applications.
TL;DR: In this survey, existing contextual embedding models, cross-lingual polyglot pre-training, the application of contextual embeddings in downstream tasks, model compression, and model analyses are reviewed.
Abstract: Contextual embeddings, such as ELMo and BERT, move beyond global word representations like Word2Vec and achieve ground-breaking performance on a wide range of natural language processing tasks. Contextual embeddings assign each word a representation based on its context, thereby capturing uses of words across varied contexts and encoding knowledge that transfers across languages. In this survey, we review existing contextual embedding models, cross-lingual polyglot pre-training, the application of contextual embeddings in downstream tasks, model compression, and model analyses.