Open AccessProceedings Article
A Novel Evaluation Method for Morphological Segmentation.
Javad Nouri,Roman Yangarber +1 more
- 01 May 2016
- pp 3102-3109
4
TL;DR: This work introduces a new evaluation methodology, which enforces correctness of segmentation boundaries while also assuring consistency of segmentations decisions across the corpus.
read more
Abstract: Unsupervised learning of morphological segmentation of words in a language, based only on a large corpus of words, is a challenging task. Evaluation of the learned segmentations is a challenge in itself, due to the inherent ambiguity of the segmentation task. There is no way to posit unique “correct” segmentation for a set of data in an objective way. Two models may arrive at different ways of segmenting the data, which may nonetheless both be valid. Several evaluation methods have been proposed to date, but they do not insist on consistency of the evaluated model. We introduce a new evaluation methodology, which enforces correctness of segmentation boundaries while also assuring consistency of segmentation decisions across the corpus.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Text Segmentation Techniques: A Critical Review
Irina Pak,Phoey Lee Teh +1 more
- 01 Jan 2018
TL;DR: Results revealed the popularity of using text segmentation in analysing different languages, and the word segment seems to be the most practical and usable segment, as it is the smaller unit than the phrase, sentence or line.
Learning Morphology of Natural Language as a Finite-State Grammar
Javad Nouri,Roman Yangarber +1 more
- 23 Oct 2017
TL;DR: Algorithm that learn to segment words in morphologically rich languages, in an unsupervised fashion, and a discussion about how the learned model relates to a morphological FSM, which is the ultimate goal.
3
•Book
Evaluating Systems for Multilingual and Multimodal Information Access. 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, Aarhus, Denmark, September 17-19, 2008, Revised Selected Papers
Cross-Language Evaluation Forum,Carol Peters +1 more
- 01 Jan 2009
TL;DR: The CLEF 2008 Multilingual Textual Document Retrieval (Ad Hoc) Track as mentioned in this paper was held at the 2008 World Wide Conference on International Journal of Distributed Sensor Networks (WSD).
References
•Book
Two-Level Morphology: A General Computational Model for Word-Form Recognition and Production
Kimmo Koskenniemi
- 01 Jan 1983
TL;DR: A language independent model for recognition and production of word forms is presented, based on a new way of describing morphological alternations that is capable of both analyzing and synthesizing word-forms.
886
Unsupervised models for morpheme segmentation and morphology learning
Mathias Creutz,Krista Lagus +1 more
TL;DR: Morfessor can handle highly inflecting and compounding languages where words can consist of lengthy sequences of morphemes and is shown to perform very well compared to a widely known benchmark algorithm on Finnish data.
•Proceedings Article
Unsupervised Multilingual Learning for Morphological Segmentation
Benjamin Snyder,Regina Barzilay +1 more
- 01 Jun 2008
TL;DR: A nonparametric Bayesian model is presented that jointly induces morpheme segmentations of each language under consideration and at the same time identifies cross-lingual morphem patterns, or abstract morphemes, of multiple languages.
Empirical Comparison of Evaluation Methods for Unsupervised Learning of Morphology
Sami Virpioja,Ville T. Turunen,Sebastian Spiegler,Oskar Kohonen,Mikko Kurimo +4 more
- 01 Jan 2011
TL;DR: An extensive meta-evaluation of the learning methods using the large collec- tion of results from the Morpho Challenge competitions is performed in order to compare the methods.
65
Related Papers (5)
Vincent Martin,Monique Thonnat +1 more
- 01 Aug 2008
C.W. Shaffrey,Ian H. Jermyn,Nick Kingsbury +2 more
- 01 Sep 2002