About: Native-language identification is a research topic. Over the lifetime, 186 publications have been published within this topic receiving 3580 citations.
TL;DR: This chapter discusses learner corpus research - past, present and future Sylviane Granger, Gaetanelle Gilquin and Fanny Meunier, and the contribution of learner corpora to reference and instructional materials design.
Abstract: 1. Introduction: learner corpus research - past, present and future Sylviane Granger, Gaetanelle Gilquin and Fanny Meunier Part I. Learner Corpus Design and Methodology: 2. From design to collection of learner corpora Gaetanelle Gilquin 3. Learner corpus methodology Marcus Callies 4. Learner corpora and psycholinguistics Philip Durrant and Anna Siyanova-Chanturia 5. Annotating learner corpora Bertus van Rooy 6. Speech annotation of learner corpora Nicolas Ballier and Philippe Martin 7. Error annotation systems Anke Ludeling and Hagen Hirschmann 8. Statistics for learner corpus research Stefan Th. Gries Part II. Analysis of Learner Language: 9. Learner corpora and lexis Tom Cobb and Marlise Horst 10. Learner corpora and phraseology Signe Oksefjell Ebeling and Hilde Hasselgard 11. Learner corpora and grammar Tom Rankin 12. Learner corpora and discourse JoAnne Neff-van Aertselaer 13. Learner corpora and pragmatics Nina Vyatkina and Joseph Cunningham Part III. Learner Corpus Research and Second Language Acquisition: 14. Second language acquisition theory and learner corpus research Florence Myles 15. Transfer and learner corpus research John Osborne 16. Learner corpora and formulaic language in second language acquisition research Nick C. Ellis, Rita Simpson-Vlach, Ute Romer, Matthew Brook O'Donnell and Stefanie Wulff 17. Developmental patterns in learner corpora Fanny Meunier 18. Variability in learner corpora Annelie Adel 19. Learner corpora and learning context Joybrato Mukherjee and Sandra Gotz Part IV. Learner Corpus Research and Language Teaching: 20. The learner corpus as a pedagogic corpus Angela Chambers 21. Learner corpora and language for academic and specific purposes Lynne Flowerdew 22. The contribution of learner corpora to reference and instructional materials design Sylviane Granger 23. Learner corpora and language testing Fiona Barker, Angeliki Salamoura and Nick Saville Part V. Learner Corpus Research and Natural Language Processing: 24. Learner corpora and natural language processing Detmar Meurers 25. Automatic grammar- and spell-checking for language learners Claudia Leacock, Martin Chodorow and Joel Tetreault 26. Learner corpora and automated scoring Derrick Higgins, Chaitanya Ramineni and Klaus Zechner 27. Learner corpora and native language identification Scott Jarvis and Magali Paquot.
TL;DR: The INTERSPEECH 2016 Computational Paralinguistics Challenge addresses three different problems for the first time in research competition under well-defined conditions: classification of deceptive vs. non-deceptive speech, the estimation of the degree of sincerity, and the identification of the native language out of 11 L1 classes of English L2 speakers.
Abstract: The INTERSPEECH 2016 Computational Paralinguistics Challenge addresses three different problems for the first time in research competition under well-defined conditions: classification of deceptive vs. non-deceptive speech, the estimation of the degree of sincerity, and the identification of the native language out of 11 L1 classes of English L2 speakers. In this paper, we describe these sub-challenges, their conditions, and the baseline feature extraction and classifiers, as provided to the participants.
TL;DR: Differences in the syntactic complexity in English writing among college-level writers with different first language (L1) backgrounds are explored and varied patterns for L2 writing research and pedagogy and for automatic native language identification of learner texts are considered.
TL;DR: A new corpus of non-native English writing will be useful for the task of native language identification, as well as grammatical error detection and correction, and automatic essay scoring.
Abstract: This report presents work on the development of a new corpus of non-native English writing. It will be useful for the task of native language identification, as well as grammatical error detection and correction, and automatic essay scoring. In this report, the corpus is described in detail.
TL;DR: The fusion track showed that combining the written and spoken responses provides a large boost in prediction accuracy, and multiple classifier systems were the most effective in all tasks, with most based on traditional classifiers with lexical/syntactic features.
Abstract: Native Language Identification (NLI) is the task of automatically identifying the native language (L1) of an individual based on their language production in a learned language. It is typically framed as a classification task where the set of L1s is known a priori. Two previous shared tasks on NLI have been organized where the aim was to identify the L1 of learners of English based on essays (2013) and spoken responses (2016) they provided during a standardized assessment of academic English proficiency. The 2017 shared task combines the inputs from the two prior tasks for the first time. There are three tracks: NLI on the essay only, NLI on the spoken response only (based on a transcription of the response and i-vector acoustic features), and NLI using both responses. We believe this makes for a more interesting shared task while building on the methods and results from the previous two shared tasks. In this paper, we report the results of the shared task. A total of 19 teams competed across the three different sub-tasks. The fusion track showed that combining the written and spoken responses provides a large boost in prediction accuracy. Multiple classifier systems (e.g. ensembles and meta-classifiers) were the most effective in all tasks, with most based on traditional classifiers (e.g. SVMs) with lexical/syntactic features.