Workshop on Statistical Machine Translation

Conference Tools

Papers published on a yearly basis

Papers

Proceedings Article•10.3115/V1/W14-3348•

Meteor Universal: Language Specific Translation Evaluation for Any Target Language

[...]

Michael Denkowski¹, Alon Lavie¹•Institutions (1)

Carnegie Mellon University¹

1 Jun 2014

TL;DR: Meteor Universal brings language specific evaluation to previously unsupported target languages by automatically extracting linguistic resources from the bitext used to train MT systems and using a universal parameter set learned from pooling human judgments of translation quality from several language directions.

...read moreread less

Abstract: This paper describes Meteor Universal, released for the 2014 ACL Workshop on Statistical Machine Translation. Meteor Universal brings language specific evaluation to previously unsupported target languages by (1) automatically extracting linguistic resources (paraphrase tables and function word lists) from the bitext used to train MT systems and (2) using a universal parameter set learned from pooling human judgments of translation quality from several language directions. Meteor Universal is shown to significantly outperform baseline BLEU on two new languages, Russian (WMT13) and Hindi (WMT14).

...read moreread less

2,582 citations

Proceedings Article•

KenLM: Faster and Smaller Language Model Queries

[...]

Kenneth Heafield¹•Institutions (1)

Carnegie Mellon University¹

30 Jul 2011

TL;DR: KenLM is a library that implements two data structures for efficient language model queries, reducing both time and memory costs and is integrated into the Moses, cdec, and Joshua translation systems.

...read moreread less

Abstract: We present KenLM, a library that implements two data structures for efficient language model queries, reducing both time and memory costs. The Probing data structure uses linear probing hash tables and is designed for speed. Compared with the widely-used SRILM, our Probing model is 2.4 times as fast while using 57% of the memory. The Trie data structure is a trie with bit-level packing, sorted records, interpolation search, and optional quantization aimed at lower memory consumption. Trie simultaneously uses less memory than the smallest lossless baseline and less CPU than the fastest baseline. Our code is open-source, thread-safe, and integrated into the Moses, cdec, and Joshua translation systems. This paper describes the several performance techniques used and presents benchmarks against alternative implementations.

...read moreread less

1,521 citations

Proceedings Article•10.18653/V1/W15-3049•

chrF: character n-gram F-score for automatic MT evaluation

[...]

Maja Popović¹•Institutions (1)

Humboldt University of Berlin¹

1 Sep 2015

TL;DR: The proposed use of character n-gram F-score for automatic evaluation of machine translation output shows very promising results, especially for the CHRF3 score – for translation from English, this variant showed the highest segment-level correlations outperforming even the best metrics on the WMT14 shared evaluation task.

...read moreread less

Abstract: We propose the use of character n-gram F-score for automatic evaluation of machine translation output. Character ngrams have already been used as a part of more complex metrics, but their individual potential has not been investigated yet. We report system-level correlations with human rankings for 6-gram F1-score (CHRF) on the WMT12, WMT13 and WMT14 data as well as segment-level correlation for 6gram F1 (CHRF) and F3-scores (CHRF3) on WMT14 data for all available target languages. The results are very promising, especially for the CHRF3 score – for translation from English, this variant showed the highest segment-level correlations outperforming even the best metrics on the WMT14 shared evaluation task.

...read moreread less

1,392 citations

Proceedings Article•10.3115/1626355.1626389•

METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments

[...]

Alon Lavie¹, Abhaya Agarwal¹•Institutions (1)

Carnegie Mellon University¹

23 Jun 2007

TL;DR: The technical details underlying the Meteor metric are recapped, the latest release includes improved metric parameters and extends the metric to support evaluation of MT output in Spanish, French and German, in addition to English.

...read moreread less

Abstract: Meteor is an automatic metric for Machine Translation evaluation which has been demonstrated to have high levels of correlation with human judgments of translation quality, significantly outperforming the more commonly used Bleu metric. It is one of several automatic metrics used in this year's shared task within the ACL WMT-07 workshop. This paper recaps the technical details underlying the metric and describes recent improvements in the metric. The latest release includes improved metric parameters and extends the metric to support evaluation of MT output in Spanish, French and German, in addition to English.

...read moreread less

1,384 citations

Proceedings Article•10.3115/V1/W14-3302•

Findings of the 2014 Workshop on Statistical Machine Translation

[...]

Ondrej Bojar¹, Christian Buck², Christian Federmann², Barry Haddow, Philipp Koehn, Johannes Leveling³, Christof Monz⁴, Pavel Pecina¹, Matt Post⁵, Herve Saint-Amand², Radu Soricut⁶, Lucia Specia⁷, Aleš Tamchyna¹ - Show less +9 more•Institutions (7)

Charles University in Prague¹, University of Edinburgh², Dublin City University³, University of Amsterdam⁴, Johns Hopkins University⁵, Google⁶, University of Sheffield⁷

1 Jun 2014

TL;DR: The results of the WMT14 shared tasks, which included a standard news translation task, a separate medical translationtask, a task for run-time estimation of machine translation quality, and a metrics task, are presented.

...read moreread less

Abstract: This paper presents the results of the WMT14 shared tasks, which included a standard news translation task, a separate medical translation task, a task for run-time estimation of machine translation quality, and a metrics task. This year, 143 machine translation systems from 23 institutions were submitted to the ten translation directions in the standard translation task. An additional 6 anonymized systems were included, and were then evaluated both automatically and manually. The quality estimation task had four subtasks, with a total of 10 teams, submitting 57 entries

...read moreread less

927 citations

...

Expand

No. of papers from the Conference in previous years
Year	Papers
2015	60
2014	65
2013	66
2012	61
2011	70
2010	63

Conference Tools

Papers published on a yearly basis

Papers

Meteor Universal: Language Specific Translation Evaluation for Any Target Language

KenLM: Faster and Smaller Language Model Queries

chrF: character n-gram F-score for automatic MT evaluation

METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments

Findings of the 2014 Workshop on Statistical Machine Translation

Performance Metrics