Open Access
Evaluating DUC 2005 using Basic Elements
Eduard Hovy,Chin-Yew Lin,Liang Zhou +2 more
- 01 Jan 2005
TL;DR: It is shown that this method correlates better with human judgments than any other automated procedure to date, and overcomes the subjectivity/variability problems of manual methods that require humans to preprocess summaries to be evaluated.
read more
Abstract: In this paper we introduce Basic Elements, a new way of automating the evaluation of text summaries. We show that this method correlates better with human judgments than any other automated procedure to date, and overcomes the subjectivity/variability problems of manual methods that require humans to preprocess summaries to be evaluated. This is demonstrated on DUC 2005 peer systems and peer-produced summaries.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Better Summarization Evaluation with Word Embeddings for ROUGE
Jun-Ping Ng,Viktoria Abrecht +1 more
- 25 Aug 2015
TL;DR: In this article, instead of measuring lexical overlaps, word embeddings are used to compute the semantic similarity of the words used in summaries instead, which is able to achieve better correlations with human judgements when measured with the Spearman and Kendall rank coefficients.
Using Dependency-Based Features to Take the 'Para-farce' out of Paraphrase
Stephen Wan,Mark Dras,Robert Dale,Cecile Paris +3 more
- 01 Nov 2006
TL;DR: A machine learning approach is proposed to be used to filter out inconsistent novel sentences, or False Paraphrases, using the Microsoft Research Paraphrase corpus and investigating whether features based on syntactic dependencies can aid in this task.
Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text
TL;DR: This paper surveys the issues with human and automatic model evaluations and with commonly used datasets in NLG that have been pointed out over the past 20 years and lays out a long-term vision for NLG evaluation and proposes concrete steps to improve their evaluation processes.
Summarization system evaluation revisited: N-gram graphs
TL;DR: A novel automatic method for the evaluation of summarization systems, based on comparing the character n-gram graphs representation of the extracted summaries and a number of model summaries, which appears to hold a level of evaluation performance that matches and even exceeds other contemporary evaluation methods.
•Book
Handbook of Research on Text and Web Mining Technologies
Min Song,Yi-Fang Brook Wu +1 more
- 30 Sep 2008
TL;DR: This compendium of pioneering studies from leading experts is essential to academic reference collections and introduces researchers and students to cutting-edge techniques for gaining knowledge discovery from unstructured text.
References
Bleu: a Method for Automatic Evaluation of Machine Translation
Kishore Papineni,Salim Roukos,Todd Ward,Wei-Jing Zhu +3 more
- 06 Jul 2002
TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Head-Driven Statistical Models for Natural Language Parsing
TL;DR: Three statistical models for natural language parsing are described, leading to approaches in which a parse tree is represented as the sequence of decisions corresponding to a head-centered, top-down derivation of the tree.
Automatic evaluation of summaries using N-gram co-occurrence statistics
Chin-Yew Lin,Eduard Hovy +1 more
- 27 May 2003
TL;DR: The results show that automatic evaluation using unigram co-occurrences between summary pairs correlates surprising well with human evaluations, based on various statistical metrics; while direct application of the BLEU evaluation procedure does not always give good results.
•Proceedings Article
A maximum-entropy-inspired parser
Eugene Charniak
- 29 Apr 2000
TL;DR: A new parser for parsing down to Penn tree-bank style parse trees that achieves 90.1% average precision/recall for sentences of length 40 and less and 89.5% when trained and tested on the previously established sections of the Wall Street Journal treebank is presented.
1.8K
Evaluating Content Selection in Summarization: The Pyramid Method
Ani Nenkova,Rebecca J. Passonneau +1 more
- 01 Jan 2004
TL;DR: It is argued that the method presented is reliable, predictive and diagnostic, thus improves considerably over the shortcomings of the human evaluation method currently used in the Document Understanding Conference.
Related Papers (5)
Chin-Yew Lin
- 25 Jul 2004
Ani Nenkova,Rebecca J. Passonneau +1 more
- 01 Jan 2004
Chin-Yew Lin,Eduard Hovy +1 more
- 27 May 2003