Data-oriented parsing

Topic Tools

Papers published on a yearly basis

Papers

Book•

Foundations of Statistical Natural Language Processing

[...]

Christopher D. Manning¹, Hinrich Schütze²•Institutions (2)

Stanford University¹, PARC²

28 May 1999

TL;DR: This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear and provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.

...read moreread less

Abstract: Statistical approaches to processing natural language text have become dominant in recent years This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear The book contains all the theory and algorithms needed for building NLP tools It provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations The book covers collocation finding, word sense disambiguation, probabilistic parsing, information retrieval, and other applications

...read moreread less

10,966 citations

Journal Article•10.1162/089120103322753356•

Head-Driven Statistical Models for Natural Language Parsing

[...]

Michael Collins¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Dec 2003-Computational Linguistics

TL;DR: Three statistical models for natural language parsing are described, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree.

...read moreread less

Abstract: This article describes three statistical models for natural language parsing. The models extend methods from probabilistic context-free grammars to lexicalized grammars, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree. Independence assumptions then lead to parameters that encode the X-bar schema, subcategorization, ordering of complements, placement of adjuncts, bigram lexical dependencies, wh-movement, and preferences for close attachment. All of these preferences are expressed by probabilities conditioned on lexical heads. The models are evaluated on the Penn Wall Street Journal Treebank, showing that their accuracy is competitive with other models in the literature. To gain a better understanding of the models, we also give results on different constituent types, as well as a breakdown of precision/recall results in recovering various types of dependencies. We analyze various characteristics of the models through experiments on parsing accuracy, by collecting frequencies of various structures in the treebank, and through linguistically motivated examples. Finally, we compare the models to others that have been applied to parsing the treebank, aiming to give some explanation of the difference in performance of the various models.

...read moreread less

2,081 citations

Proceedings Article•

A maximum-entropy-inspired parser

[...]

Eugene Charniak¹•Institutions (1)

Brown University¹

29 Apr 2000

TL;DR: A new parser for parsing down to Penn tree-bank style parse trees that achieves 90.1% average precision/recall for sentences of length 40 and less and 89.5% when trained and tested on the previously established sections of the Wall Street Journal treebank is presented.

...read moreread less

Abstract: We present a new parser for parsing down to Penn tree-bank style parse trees that achieves 90.1% average precision/recall for sentences of length 40 and less, and 89.5% for sentences of length 100 and less when trained and tested on the previously established [5, 9, 10, 15, 17] "standard" sections of the Wall Street Journal treebank. This represents a 13% decrease in error rate over the best single-parser results on this corpus [9]. The major technical innovation is the use of a "maximum-entropy-inspired" model for conditioning and smoothing that let us successfully to test and combine many different conditioning events. We also present some partial results showing the effects of different conditioning information, including a surprising 2% improvement due to guessing the lexical head's pre-terminal before guessing the lexical head.

...read moreread less

1,812 citations

Proceedings Article•10.3115/981658.981695•

Statistical Decision-Tree Models for Parsing

[...]

David M. Magerman¹•Institutions (1)

BBN Technologies¹

26 Jun 1995

TL;DR: SPATTER as discussed by the authors is a statistical parser based on decision-tree learning techniques which constructs a complete parse for every sentence and achieves accuracy rates far better than any published result, which is based on the following premises: (1) grammars are too complex and detailed to develop manually for most interesting domains; parsing models must rely heavily on lexical and contextual information to analyze sentences accurately; and existing n-gram modeling techniques are inadequate for parsing models.

...read moreread less

Abstract: Syntactic natural language parsers have shown themselves to be inadequate for processing highly-ambiguous large-vocabulary text, as is evidenced by their poor performance on domains like the Wall Street Journal, and by the movement away from parsing-based approaches to text-processing in general. In this paper, I describe SPATTER, a statistical parser based on decision-tree learning techniques which constructs a complete parse for every sentence and achieves accuracy rates far better than any published result. This work is based on the following premises: (1) grammars are too complex and detailed to develop manually for most interesting domains; (2) parsing models must rely heavily on lexical and contextual information to analyze sentences accurately; and (3) existing n-gram modeling techniques are inadequate for parsing models. In experiments comparing SPATTER with IBM's computer manuals parser, SPATTER significantly outperforms the grammar-based parser. Evaluating SPATTER against the Penn Treebank Wall Street Journal corpus using the PARSEVAL measures, SPATTER achieves 86% precision, 86% recall, and 1.3 crossing brackets per sentence for sentences of 40 words or less, and 91% precision, 90% recall, and 0.5 crossing brackets for sentences between 10 and 20 words in length.

...read moreread less

659 citations

Proceedings Article•

Statistical parsing with a context-free grammar and word statistics

[...]

Eugene Charniak¹•Institutions (1)

Brown University¹

27 Jul 1997

TL;DR: A parsing system based upon a language model for English that is, in turn, based upon assigning probabilities to possible parses for a sentence that outperforms previous schemes is described.

...read moreread less

Abstract: We describe a parsing system based upon a language model for English that is, in turn, based upon assigning probabilities to possible parses for a sentence. This model is used in a parsing system by finding the parse for the sentence with the highest probability. This system outperforms previous schemes. As this is the third in a series of parsers by different authors that are similar enough to invite detailed comparisons but different enough to give rise to different levels of performance, we also report on some experiments designed to identify what aspects of these systems best explain their relative performance.

...read moreread less

644 citations

...

Expand

Year	Papers
2016	1
2014	1
2013	3
2012	1
2011	3
2010	1

Topic Tools

Papers published on a yearly basis

Papers

Foundations of Statistical Natural Language Processing

Head-Driven Statistical Models for Natural Language Parsing

A maximum-entropy-inspired parser

Statistical Decision-Tree Models for Parsing

Statistical parsing with a context-free grammar and word statistics

Related Topics (5)

Performance Metrics