Evaluating DUC 2005 using Basic Elements

Open Access

Evaluating DUC 2005 using Basic Elements

- 01 Jan 2005

115

TL;DR: It is shown that this method correlates better with human judgments than any other automated procedure to date, and overcomes the subjectivity/variability problems of manual methods that require humans to preprocess summaries to be evaluated.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.18653/V1/D15-1222

Better Summarization Evaluation with Word Embeddings for ROUGE

Jun-Ping Ng, +1 more

- 25 Aug 2015

TL;DR: In this article, instead of measuring lexical overlaps, word embeddings are used to compute the semantic similarity of the words used in summaries instead, which is able to achieve better correlations with human judgements when measured with the Spearman and Kendall rank coefficients.

...read moreread less

206

Using Dependency-Based Features to Take the 'Para-farce' out of Paraphrase

Stephen Wan, +3 more

- 01 Nov 2006

TL;DR: A machine learning approach is proposed to be used to filter out inconsistent novel sentences, or False Paraphrases, using the Microsoft Research Paraphrase corpus and investigating whether features based on syntactic dependencies can aid in this task.

...read moreread less

185

•Journal Article•10.1613/jair.1.13715

Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text

Sebastian Gehrmann, +2 more

- 14 Feb 2022

- Journal of Artificial Intelligence Resea...

TL;DR: This paper surveys the issues with human and automatic model evaluations and with commonly used datasets in NLG that have been pointed out over the past 20 years and lays out a long-term vision for NLG evaluation and proposes concrete steps to improve their evaluation processes.

...read moreread less

176

Journal Article•10.1145/1410358.1410359

Summarization system evaluation revisited: N-gram graphs

George Giannakopoulos, +3 more

- 13 Oct 2008

- ACM Transactions on Speech and Language ...

TL;DR: A novel automatic method for the evaluation of summarization systems, based on comparing the character n-gram graphs representation of the extracted summaries and a number of model summaries, which appears to hold a level of evaluation performance that matches and even exceeds other contemporary evaluation methods.

...read moreread less

171

•Book

Handbook of Research on Text and Web Mining Technologies

Min Song, +1 more

- 30 Sep 2008

TL;DR: This compendium of pioneering studies from leading experts is essential to academic reference collections and introduces researchers and students to cutting-edge techniques for gaining knowledge discovery from unstructured text.

...read moreread less

151

...

Expand

References

•Proceedings Article•10.3115/1073083.1073135

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

- 06 Jul 2002

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

...read moreread less

28.9K

•Journal Article•10.1162/089120103322753356

Head-Driven Statistical Models for Natural Language Parsing

Michael Collins

- 01 Dec 2003

- Computational Linguistics

TL;DR: Three statistical models for natural language parsing are described, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree.

...read moreread less

2K

•Proceedings Article•10.3115/1073445.1073465

Automatic evaluation of summaries using N-gram co-occurrence statistics

Chin-Yew Lin, +1 more

- 27 May 2003

TL;DR: The results show that automatic evaluation using unigram co-occurrences between summary pairs correlates surprising well with human evaluations, based on various statistical metrics; while direct application of the BLEU evaluation procedure does not always give good results.

...read moreread less

2K

•Proceedings Article

A maximum-entropy-inspired parser

Eugene Charniak

- 29 Apr 2000

TL;DR: A new parser for parsing down to Penn tree-bank style parse trees that achieves 90.1% average precision/recall for sentences of length 40 and less and 89.5% when trained and tested on the previously established sections of the Wall Street Journal treebank is presented.

...read moreread less

1.8K

•Proceedings Article•10.7916/D80R9XVD

Evaluating Content Selection in Summarization: The Pyramid Method

Ani Nenkova, +1 more

- 01 Jan 2004

TL;DR: It is argued that the method presented is reliable, predictive and diagnostic, thus improves considerably over the shortcomings of the human evaluation method currently used in the Document Understanding Conference.

...read moreread less

727