Paraphrasing for Automatic Evaluation
David Kauchak,Regina Barzilay +1 more
- 04 Jun 2006
- pp 455-462
TL;DR: It is shown that the use of a paraphrased synthetic reference refines the accuracy of automatic evaluation and there is a strong connection between the quality of automatic paraphrases as judged by humans and their contribution to automatic evaluation.
read more
Abstract: This paper studies the impact of paraphrases on the accuracy of automatic evaluation. Given a reference sentence and a machine-generated sentence, we seek to find a paraphrase of the reference sentence that is closer in wording to the machine output than the original reference. We apply our paraphrasing method in the context of machine translation evaluation. Our experiments show that the use of a paraphrased synthetic reference refines the accuracy of automatic evaluation. We also found a strong connection between the quality of automatic paraphrases as judged by humans and their contribution to automatic evaluation.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Survey of the state of the art in natural language generation: core tasks, applications and evaluation
Albert Gatt,Emiel Krahmer +1 more
TL;DR: A survey of the state of the art in natural language generation can be found in this article, with an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organized.
A survey of paraphrasing and textual entailment methods
TL;DR: Key ideas from the two areas of paraphrasing and textual entailment are summarized by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.
•Proceedings Article
UMBC_EBIQUITY-CORE: Semantic Textual Similarity Systems
Lushan Han,Abhay L. Kashyap,Tim Finin,James Mayfield,Jonathan Weese +4 more
- 13 Jun 2013
TL;DR: Three semantic text similarity systems developed for the *SEM 2013 STS shared task used a simple term alignment algorithm augmented with penalty terms, and two used support vector regression models to combine larger sets of features.
A Survey of Paraphrasing and Textual Entailment Methods
TL;DR: Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar as mentioned in this paper, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true.
•Posted Content
Evaluation of Text Generation: A Survey
TL;DR: This paper surveys evaluation methods of natural language generation (NLG) systems that have been developed in the last few years, with a focus on the evaluation of recently proposed NLG tasks and neural NLG models.
376
References
The measurement of observer agreement for categorical data
J. R. Landis,Gary G. Koch +1 more
TL;DR: A general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies is presented and tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interob server agreement are developed as generalized kappa-type statistics.
76.1K
Bleu: a Method for Automatic Evaluation of Machine Translation
Kishore Papineni,Salim Roukos,Todd Ward,Wei-Jing Zhu +3 more
- 06 Jul 2002
TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
•Proceedings Article
METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments
Satanjeev Banerjee,Alon Lavie +1 more
- 01 Jun 2005
TL;DR: METEOR is described, an automatic metric for machine translation evaluation that is based on a generalized concept of unigram matching between the machineproduced translation and human-produced reference translations and can be easily extended to include more advanced matching strategies.
Class-based n -gram models of natural language
Peter Fitzhugh Brown,Peter Vincent Desouza,Robert Leroy Mercer,Vincent J. Della Pietra,Jenifer C. Lai +4 more
TL;DR: This work addresses the problem of predicting a word from previous words in a sample of text and discusses n-gram models based on classes of words, finding that these models are able to extract classes that have the flavor of either syntactically based groupings or semanticallybased groupings, depending on the nature of the underlying statistics.
3.6K
The Third PASCAL Recognizing Textual Entailment Challenge
Danilo Giampiccolo,Bernardo Magnini,Ido Dagan,Bill Dolan +3 more
- 28 Jun 2007
TL;DR: This paper presents the Third PASCAL Recognising Textual Entailment Challenge (RTE-3), providing an overview of the dataset creating methodology and the submitted systems.