Sample Selection for Statistical Parsing

doi:10.1162/0891201041850894

Open AccessJournal Article10.1162/0891201041850894

Sample Selection for Statistical Parsing

Rebecca Hwa

- 01 Sep 2004

- Computational Linguistics

- Vol. 30, Iss: 3, pp 253-276

170

TL;DR: It is found that sample selection can significantly reduce the size of annotated training corpora and that uncertainty is a robust predictive criterion that can be easily applied to different learning models.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Active Learning Literature Survey

Burr Settles

- 01 Jan 2009

TL;DR: This report provides a general introduction to active learning and a survey of the literature, including a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date.

...read moreread less

6.7K

•Proceedings Article•10.3115/1613715.1613855

An Analysis of Active Learning Strategies for Sequence Labeling Tasks

Burr Settles, +1 more

- 25 Oct 2008

TL;DR: This paper surveys previously used query selection strategies for sequence models, and proposes several novel algorithms to address their shortcomings, and conducts a large-scale empirical comparison.

...read moreread less

1.2K

An introduction to mathematical statistical and its applications / Richard J. Larsen, Morris L. Marx

Richard J. Larsen, +1 more

- 01 Jan 1981

TL;DR: In this article, Monte Carlo techniques are used to estimate the probability of a given set of variables for a particular set of classes of data, such as conditional probability and hypergeometric probability.

...read moreread less

524

•Journal Article•10.1007/S10994-007-5019-5

Active learning for logistic regression: an evaluation

Andrew I. Schein, +1 more

- 01 Oct 2007

- Machine Learning

TL;DR: A re-derive of the variance reduction method known in experimental design circles as ‘A-optimality’ and comparisons against different variations of the most widely used heuristic schemes are run to discover which methods work best for different classes of problems and why.

...read moreread less

415

•Dissertation

Semi-Supervised Learning for Natural Language

Percy Liang

- 01 Jan 2005

TL;DR: This thesis focuses on two segmentation tasks, named-entity recognition and Chinese word segmentation, and shows that features derived from unlabeled data substantially improves performance, both in terms of reducing the amount of labeled data needed to achieve a certain performance level and in termsof reducing the error using a fixed amount of labeling data.

...read moreread less

374

...

Expand

References

•Book

Elements of information theory

Thomas M. Cover, +1 more

- 01 Jan 1991

TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.

...read moreread less

52.2K

•Report•10.21236/ADA273556

Building a large annotated corpus of English: the penn treebank

Mitchell Marcus, +2 more

- 01 Jun 1993

- Computational Linguistics

TL;DR: As a result of this grant, the researchers have now published on CDROM a corpus of over 4 million words of running text annotated with part-of- speech (POS) tags, which includes a fully hand-parsed version of the classic Brown corpus.

...read moreread less

9.2K

Proceedings Article•10.1145/279943.279962

Combining labeled and unlabeled data with co-training

Avrim Blum, +1 more

- 24 Jul 1998

TL;DR: A PAC-style analysis is provided for a problem setting motivated by the task of learning to classify web pages, in which the description of each example can be partitioned into two distinct views, to allow inexpensive unlabeled data to augment, a much smaller set of labeled examples.

...read moreread less

6.4K

•Journal Article•10.1162/089120103322753356

Head-Driven Statistical Models for Natural Language Parsing

Michael Collins

- 01 Dec 2003

- Computational Linguistics

TL;DR: Three statistical models for natural language parsing are described, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree.

...read moreread less

2K

•Proceedings Article

A maximum-entropy-inspired parser

Eugene Charniak

- 29 Apr 2000

TL;DR: A new parser for parsing down to Penn tree-bank style parse trees that achieves 90.1% average precision/recall for sentences of length 40 and less and 89.5% when trained and tested on the previously established sections of the Wall Street Journal treebank is presented.

...read moreread less

1.8K