Proceedings Article10.3115/1613704.1613708
Semantics-based Multiword Expression Extraction
Tim Van de Cruys,Bego~na Villada Moir'on +1 more
- 28 Jun 2007
- pp 25-32
TL;DR: A fully unsupervised and automated method for large-scale extraction of multiword expressions (MWEs) from large corpora that formalizes the intuition of non-compositionality of mwes.
read more
Abstract: This paper describes a fully unsupervised and automated method for large-scale extraction of multiword expressions (MWEs) from large corpora. The method aims at capturing the non-compositionality of mwes; the intuition is that a noun within a mwe cannot easily be replaced by a semantically similar noun. To implement this intuition, a noun clustering is automatically extracted (using distributional similarity measures), which gives us clusters of semantically related nouns. Next, a number of statistical measures -- based on selectional preferences --- is developed that formalize the intuition of non-compositionality. Our approach has been tested on Dutch, and automatically evaluated using Dutch lexical resources.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Unsupervised type and token identification of idiomatic expressions
TL;DR: This article develops statistical measures that each model a specific property of idiomatic expressions by looking at their actual usage patterns in text, and uses some of the measures in a token identification task where they distinguish idiomatic and literal usages of potentially idiomatic expression in context.
Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources
Yulia Tsvetkov,Shuly Wintner +1 more
- 27 Jul 2011
TL;DR: This work defines various linguistically motivated classification features and introduces novel ways for computing them, and manually defines interrelationships among the features, and expresses them in a Bayesian network, resulting in a powerful classifier that can identify multiword expressions of various types and multiple syntactic constructions in text corpora.
Extraction of multi-word expressions from small parallel corpora
Yulia Tsvetkov,Shuly Wintner +1 more
TL;DR: This article proposed a method for extracting multi-word expressions (MWEs) of various types, along with their translations, from small, word-aligned parallel corpora, focusing on misalignments; these typically indicate expressions in the source language that are translated to the target in a non-compositional way.
56
DuELME: a Dutch electronic lexicon of multiword expressions
Nicole Grégoire
- 01 Apr 2010
TL;DR: It is shown that introducing parameters to the ECM optimizes the method and the extraction of candidate expressions from corpora and the selection criteria of the lexical entries are discussed.
References
Some methods for classification and analysis of multivariate observations
James B. MacQueen
- 01 Jan 1967
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Verb semantics and lexical selection
Zhibiao Wu,Martha Palmer +1 more
- 27 Jun 1994
Abstract: This paper will focus on the semantic representation of verbs in computer systems and its impact on lexical selection problems in machine translation (MT). Two groups of English and Chinese verbs are examined to show that lexical selection must be based on interpretation of the sentences as well as selection restrictions placed on the verb arguments. A novel representation scheme is suggested, and is compared to representations with selection restrictions used in transfer-based MT. We see our approach as closely aligned with knowledge-based MT approaches (KBMT), and as a separate component that could be incorporated into existing systems. Examples and experimental results will show that, using this scheme, inexact matches can achieve correct lexical selection.
•Posted Content
Verb Semantics and Lexical Selection
Zhibiao Wu,Martha Palmer +1 more
TL;DR: This paper will focus on the semantic representation of verbs in computer systems and its impact on lexical selection problems in machine translation (MT), and sees the approach as closely aligned with knowledge-based MT approaches (KBMT), and as a separate component that could be incorporated into existing systems.
2.7K
Automatic Retrieval and Clustering of Similar Words
Dekang Lin
- 10 Aug 1998
TL;DR: A word similarity measure based on the distributional pattern of words allows the automatically constructed thesaurus to be significantly closer to WordNet than Roget Thesaurus is.
1.8K