Paraphrasing for Automatic Evaluation

doi:10.3115/1220835.1220893

Open AccessProceedings Article10.3115/1220835.1220893

Paraphrasing for Automatic Evaluation

David Kauchak, +1 more

- 04 Jun 2006

- pp 455-462

273

TL;DR: It is shown that the use of a paraphrased synthetic reference refines the accuracy of automatic evaluation and there is a strong connection between the quality of automatic paraphrases as judged by humans and their contribution to automatic evaluation.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1613/JAIR.5477

Survey of the state of the art in natural language generation: core tasks, applications and evaluation

Albert Gatt, +1 more

- 01 Jan 2018

- Journal of Artificial Intelligence Resea...

TL;DR: A survey of the state of the art in natural language generation can be found in this article, with an up-to-date synthesis of research on the core tasks in NLG and the architectures adopted in which such tasks are organized.

...read moreread less

762

•Journal Article•10.1613/JAIR.2985

A survey of paraphrasing and textual entailment methods

Ion Androutsopoulos, +1 more

- 01 May 2010

- Journal of Artificial Intelligence Resea...

TL;DR: Key ideas from the two areas of paraphrasing and textual entailment are summarized by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.

...read moreread less

458

•Proceedings Article

UMBC_EBIQUITY-CORE: Semantic Textual Similarity Systems

Lushan Han, +4 more

- 13 Jun 2013

TL;DR: Three semantic text similarity systems developed for the *SEM 2013 STS shared task used a simple term alignment algorithm augmented with penalty terms, and two used support vector regression models to combine larger sets of features.

...read moreread less

422

•Journal Article•10.1613/JAIR.2985

A Survey of Paraphrasing and Textual Entailment Methods

Ion Androutsopoulos, +1 more

- 18 Dec 2009

- arXiv: Computation and Language

TL;DR: Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar as mentioned in this paper, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true.

...read moreread less

388

•Posted Content

Evaluation of Text Generation: A Survey

Asli Celikyilmaz, +2 more

- 26 Jun 2020

- arXiv: Computation and Language

TL;DR: This paper surveys evaluation methods of natural language generation (NLG) systems that have been developed in the last few years, with a focus on the evaluation of recently proposed NLG tasks and neural NLG models.

...read moreread less

376

...

Expand

References

Journal Article•10.2307/2529310

The measurement of observer agreement for categorical data

J. R. Landis, +1 more

- 01 Mar 1977

- Biometrics

TL;DR: A general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies is presented and tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interob server agreement are developed as generalized kappa-type statistics.

...read moreread less

76.1K

•Proceedings Article•10.3115/1073083.1073135

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

- 06 Jul 2002

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

...read moreread less

28.9K

•Proceedings Article

METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments

Satanjeev Banerjee, +1 more

- 01 Jun 2005

TL;DR: METEOR is described, an automatic metric for machine translation evaluation that is based on a generalized concept of unigram matching between the machineproduced translation and human-produced reference translations and can be easily extended to include more advanced matching strategies.

...read moreread less

5.9K

•Journal Article•10.5555/176313.176316

Class-based n -gram models of natural language

Peter Fitzhugh Brown, +4 more

- 01 Dec 1992

- Computational Linguistics

TL;DR: This work addresses the problem of predicting a word from previous words in a sample of text and discusses n-gram models based on classes of words, finding that these models are able to extract classes that have the flavor of either syntactically based groupings or semanticallybased groupings, depending on the nature of the underlying statistics.

...read moreread less

3.6K

Book Chapter•10.1007/11736790_9

The Third PASCAL Recognizing Textual Entailment Challenge

Danilo Giampiccolo, +3 more

- 28 Jun 2007

TL;DR: This paper presents the Third PASCAL Recognising Textual Entailment Challenge (RTE-3), providing an overview of the dataset creating methodology and the submitted systems.

...read moreread less

2.5K

...

Expand

Paraphrasing for Automatic Evaluation

Chat with Paper

AI Agents for this Paper

Citations

Survey of the state of the art in natural language generation: core tasks, applications and evaluation

A survey of paraphrasing and textual entailment methods

UMBC_EBIQUITY-CORE: Semantic Textual Similarity Systems

A Survey of Paraphrasing and Textual Entailment Methods

Evaluation of Text Generation: A Survey

References

The measurement of observer agreement for categorical data

Bleu: a Method for Automatic Evaluation of Machine Translation

METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments

Class-based n -gram models of natural language

The Third PASCAL Recognizing Textual Entailment Challenge

Related Papers (5)

Bleu: a Method for Automatic Evaluation of Machine Translation

A Study of Translation Edit Rate with Targeted Human Annotation

Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources

WordNet : an electronic lexical database

METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments