Minimally-Supervised Morphological Segmentation using Adaptor Grammars
Kairit Sirts,Sharon Goldwater +1 more
TL;DR: This paper explores the use of Adaptor Grammars, a nonparametric Bayesian modelling framework, for minimally supervised morphological segmentation, and shows that semi-supervised training provides a boost over unsupervisedTraining, while the model selection method yields the best average results over all languages and is competitive with state-of-the-art semi- supervised systems.
read more
Abstract: This paper explores the use of Adaptor Grammars, a nonparametric Bayesian modelling framework, for minimally supervised morphological segmentation. We compare three training methods: unsupervised training, semi-supervised training, and a novel model selection method. In the model selection method, we train unsupervised Adaptor Grammars using an over-articulated metagrammar , then use a small labelled data set to select which potential morph boundaries identified by the meta-grammar should be returned in the final output. We evaluate on five languages and show that semi-supervised training provides a boost over unsupervised training, while the model selection method yields the best average results over all languages and is competitive with state-of-the-art semi-supervised systems. Moreover, this method provides the potential to tune performance according to different evaluation metrics or downstream tasks.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Case Studies in the Automatic Characterization of Grammars from Small Wordlists
Jordan Kodner,Spencer Kaplan,Hongzhi Xu,Mitchell Marcus,Charles Yang +4 more
- 01 Mar 2017
TL;DR: Two novel examples of simple algorithms which characterize the grammars of low-resource languages are presented: a tool for the characterization of vowel harmony, and a framework for unsupervised morphological segmentation which achieves state-of-the-art performance.
Rage against the machine: Evaluation metrics in the 21st century
TL;DR: The authors review the classic literature in generative grammar and Marr's three-level program for cognitive science to defend the Evaluation Metric as a psychological theory of language learning, focusing on language learning.
•Proceedings Article
Morphological segmentation with window LSTM neural networks
Linlin Wang,Zhu Cao,Yu Xia,Gerard de Melo +3 more
- 12 Feb 2016
TL;DR: Novel neural network architectures that learn the structure of input sequences directly from raw input words and are subsequently able to predict morphological boundaries are proposed.
56
Non-Parametric Bayesian Models for Computational Morphology
Kairit Sirts,Sharon Goldwater,Leo Võhandu +2 more
- 18 Jun 2015
TL;DR: In this paper, the authors propose a method to solve the problem of "uniformity" and "uncertainty" in the context of video games.1.11.11
42
A comparative study of minimally supervised morphological segmentation
TL;DR: A comparative study of a subfield of morphology learning referred to as minimally supervised morphological segmentation concludes that the existing methodology contains substantial work on generative morph lexicon-based approaches and methods based on discriminative boundary detection.
References
The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator
Jim Pitman,Marc Yor +1 more
TL;DR: The two-parameter Poisson-Dirichlet distribution with a single parameter is known as the size-biased random permutation (SBNP) as discussed by the authors, which was introduced by Engen in the context of species diversity and rediscovered by Perman and the authors in the study of excursions of Bessel processes.
Unsupervised learning of the morphology of a natural language
TL;DR: This study reports the results of using minimum description length (MDL) analysis to model unsupervised learning of the morphological segmentation of European languages, using corpora ranging in size from 5,000 Words to 500,000 words.
Unsupervised models for morpheme segmentation and morphology learning
Mathias Creutz,Krista Lagus +1 more
TL;DR: Morfessor can handle highly inflecting and compounding languages where words can consist of lengthy sequences of morphemes and is shown to perform very well compared to a widely known benchmark algorithm on Finnish data.
•Proceedings Article
Interpolating between types and tokens by estimating power-law generators
Sharon Goldwater,Mark Johnson,Thomas L. Griffiths +2 more
- 05 Dec 2005
TL;DR: It is shown that taking a particular stochastic process - the Pitman-Yor process - as an adaptor justifies the appearance of type frequencies in formal analyses of natural language, and improves the performance of a model for unsupervised learning of morphology.
•Proceedings Article
Unsupervised Multilingual Learning for Morphological Segmentation
Benjamin Snyder,Regina Barzilay +1 more
- 01 Jun 2008
TL;DR: A nonparametric Bayesian model is presented that jointly induces morpheme segmentations of each language under consideration and at the same time identifies cross-lingual morphem patterns, or abstract morphemes, of multiple languages.