(Meta-) Evaluation of Machine Translation

doi:10.3115/1626355.1626373

Open AccessProceedings Article10.3115/1626355.1626373

(Meta-) Evaluation of Machine Translation

Chris Callison-Burch, +4 more

- 23 Jun 2007

- pp 136-158

502

TL;DR: An extensive human evaluation was carried out not only to rank the different MT systems, but also to perform higher-level analysis of the evaluation process, revealing surprising facts about the most commonly used methodologies.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Book•10.1075/BTL.133

From process to product: links between post-editing effort and post-edited quality

Lucas Nunes Vieira

- 02 Oct 2017

•Proceedings Article•10.3115/1599081.1599206

Domain Adaptation for Statistical Machine Translation with Domain Dictionary and Monolingual Corpora

Hua Wu, +2 more

- 18 Aug 2008

TL;DR: This method first uses out-of-domain corpora to train a baseline system and then uses in-domain translation dictionaries and in- domain monolingual corpora in a unified framework to improve the in- domains performance.

...read moreread less

Proceedings Article•10.48550/arXiv.2205.08533

Consistent Human Evaluation of Machine Translation across Language Pairs

Daniel Licht, +5 more

- 17 May 2022

TL;DR: A new metric called XSTS is proposed that is more focused on semantic equivalence and a cross-lingual calibration method that enables more consistent assessment of machine translation systems through human evaluation.

...read moreread less

•Proceedings Article•10.1145/1816123.1816126

Transferring structural markup across translations using multilingual alignment and projection

David Bamman, +2 more

- 21 Jun 2010

TL;DR: This approach has the potential to allow a highly granular multilingual digital library to be bootstrapped by applying the knowledge contained in a small, heavily curated collection to a much larger but unstructured one.

...read moreread less

•Proceedings Article

A Dataset for Assessing Machine Translation Evaluation Metrics

Lucia Specia, +2 more

- 01 May 2010

TL;DR: A dataset containing 16,000 translations produced by four machine translation systems and manually annotated for quality by professional translators is described, which can be used in a range of tasks assessing machine translation evaluation metrics.

...read moreread less

...

Expand

References

Journal Article•10.2307/2529310

The measurement of observer agreement for categorical data

J. R. Landis, +1 more

- 01 Mar 1977

- Biometrics

TL;DR: A general statistical methodology for the analysis of multivariate categorical data arising from observer reliability studies is presented and tests for interobserver bias are presented in terms of first-order marginal homogeneity and measures of interob server agreement are developed as generalized kappa-type statistics.

...read moreread less

76.1K

•Proceedings Article•10.3115/1073083.1073135

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

- 06 Jul 2002

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

...read moreread less

28.9K

•Proceedings Article

METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments

Satanjeev Banerjee, +1 more

- 01 Jun 2005

TL;DR: METEOR is described, an automatic metric for machine translation evaluation that is based on a generalized concept of unigram matching between the machineproduced translation and human-produced reference translations and can be easily extended to include more advanced matching strategies.

...read moreread less

5.9K

•Journal Article•10.1162/089120103321337421

A systematic comparison of various statistical alignment models

Franz Josef Och, +1 more

- 01 Mar 2003

- Computational Linguistics

TL;DR: An important result is that refined alignment models with a first-order dependence and a fertility model yield significantly better results than simple heuristic models.

...read moreread less

4.6K

•Proceedings Article•10.3115/1073445.1073462

Statistical phrase-based translation

Philipp Koehn, +2 more

- 27 May 2003

TL;DR: The empirical results suggest that the highest levels of performance can be obtained through relatively simple means: heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translation.

...read moreread less

4.1K