Prototype-Driven Grammar Induction
Aria Haghighi,Dan Klein +1 more
- 17 Jul 2006
- pp 881-888
TL;DR: To improve the quality of the induced trees, this work combines the PCFG induction with the CCM model of Klein and Manning (2002), which has complementary stengths: it identifies brackets but does not label them.
read more
Abstract: We investigate prototype-driven learning for primarily unsupervised grammar induction. Prior knowledge is specified declaratively, by providing a few canonical examples of each target phrase type. This sparse prototype information is then propagated across a corpus using distributional similarity features, which augment an otherwise standard PCFG model. We show that distributional features are effective at distinguishing bracket labels, but not determining bracket locations. To improve the quality of the induced trees, we combine our PCFG induction with the CCM model of Klein and Manning (2002), which has complementary stengths: it identifies brackets but does not label them. Using only a handful of prototypes, we show substantial improvements over naive PCFG induction for English and Chinese grammar induction.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Proceedings Article
Painless Unsupervised Learning with Features
Taylor Berg-Kirkpatrick,Alexandre Bouchard-Côté,John DeNero,Dan Klein +3 more
- 02 Jun 2010
TL;DR: This work shows how features can easily be added to standard generative models for unsupervised learning, without requiring complex new training methods, and applies this technique to part-of-speech induction, grammar induction, word alignment, and word segmentation.
Semi-supervised Learning of Dependency Parsers using Generalized Expectation Criteria
Gregory Druck,Gideon S. Mann,Andrew McCallum +2 more
- 02 Aug 2009
TL;DR: A novel method for semi-supervised learning of non-projective log-linear dependency parsers using directly expressed linguistic prior knowledge (e.g. a noun's parent is often a verb) and it is shown that GE can attain better accuracy with as few as 20 intuitive constraints.
•Proceedings Article
Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus
Emily M. Bender,Dan Flickinger,Stephan Oepen,Yi Zhang +3 more
- 27 Jul 2011
TL;DR: This work studies 100 examples each of ten reasonably frequent linguistic phenomena, randomly selected from a parsed version of the English Wikipedia, and constructs a corresponding set of gold-standard target dependencies for these 1000 sentences to measure the accuracy of parser accuracy over naturally occurring text.
Language ID in the Context of Harvesting Language Data off the Web
Fei Xia,William Lewis,Hoifung Poon +2 more
- 30 Mar 2009
TL;DR: It is argued that language ID is far from solved when one considers input spanning not dozens of languages, but rather hundreds to thousands, a number that one approaches when harvesting language data found on the Web.
References
•Book
An Introduction to Functional Grammar
Michael Halliday
- 01 Jan 1985
TL;DR: Part 1 The clause: constituency towards a functional grammar clause as message clause as exchange clause as representation and above, below and beyond the clause: below the clause - groups and phrases above the clauses - the clause complex additional.
14K
•Book
Foundations of Statistical Natural Language Processing
Christopher D. Manning,Hinrich Schütze +1 more
- 28 May 1999
TL;DR: This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear and provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.
Building a large annotated corpus of English: the penn treebank
TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.
Distributional Structure
TL;DR: This discussion will discuss how each language can be described in terms of a distributional structure, i.e. in Terms of the occurrence of parts relative to other parts, and how this description is complete without intrusion of other features such as history or meaning.
4.2K
Distributional Structure
Zellig S. Harris
- 01 Jan 1981
TL;DR: This discussion will discuss how each language can be described in terms of a distributional structure, i.e. in Terms of the occurrence of parts relative to other parts, and how this description is complete without intrusion of other features such as history or meaning.
3.6K