Conference
Workshop on Statistical Machine Translation
About: Workshop on Statistical Machine Translation is an academic conference. The conference publishes majorly in the area(s): Machine translation & Rule-based machine translation. Over the lifetime, 532 publications have been published by the conference receiving 22856 citations.
Topics: Machine translation, Rule-based machine translation, Example-based machine translation, Phrase, Evaluation of machine translation
Papers
1 Jun 2014
TL;DR: Meteor Universal brings language specific evaluation to previously unsupported target languages by automatically extracting linguistic resources from the bitext used to train MT systems and using a universal parameter set learned from pooling human judgments of translation quality from several language directions.
Abstract: This paper describes Meteor Universal, released for the 2014 ACL Workshop on Statistical Machine Translation. Meteor Universal brings language specific evaluation to previously unsupported target languages by (1) automatically extracting linguistic resources (paraphrase tables and function word lists) from the bitext used to train MT systems and (2) using a universal parameter set learned from pooling human judgments of translation quality from several language directions. Meteor Universal is shown to significantly outperform baseline BLEU on two new languages, Russian (WMT13) and Hindi (WMT14).
2,582 citations
Proceedings Article•
30 Jul 2011TL;DR: KenLM is a library that implements two data structures for efficient language model queries, reducing both time and memory costs and is integrated into the Moses, cdec, and Joshua translation systems.
Abstract: We present KenLM, a library that implements two data structures for efficient language model queries, reducing both time and memory costs. The Probing data structure uses linear probing hash tables and is designed for speed. Compared with the widely-used SRILM, our Probing model is 2.4 times as fast while using 57% of the memory. The Trie data structure is a trie with bit-level packing, sorted records, interpolation search, and optional quantization aimed at lower memory consumption. Trie simultaneously uses less memory than the smallest lossless baseline and less CPU than the fastest baseline. Our code is open-source, thread-safe, and integrated into the Moses, cdec, and Joshua translation systems. This paper describes the several performance techniques used and presents benchmarks against alternative implementations.
1,521 citations
1 Sep 2015
TL;DR: The proposed use of character n-gram F-score for automatic evaluation of machine translation output shows very promising results, especially for the CHRF3 score – for translation from English, this variant showed the highest segment-level correlations outperforming even the best metrics on the WMT14 shared evaluation task.
Abstract: We propose the use of character n-gram F-score for automatic evaluation of machine translation output. Character ngrams have already been used as a part of more complex metrics, but their individual potential has not been investigated yet. We report system-level correlations with human rankings for 6-gram F1-score (CHRF) on the WMT12, WMT13 and WMT14 data as well as segment-level correlation for 6gram F1 (CHRF) and F3-scores (CHRF3) on WMT14 data for all available target languages. The results are very promising, especially for the CHRF3 score – for translation from English, this variant showed the highest segment-level correlations outperforming even the best metrics on the WMT14 shared evaluation task.
1,392 citations
23 Jun 2007
TL;DR: The technical details underlying the Meteor metric are recapped, the latest release includes improved metric parameters and extends the metric to support evaluation of MT output in Spanish, French and German, in addition to English.
Abstract: Meteor is an automatic metric for Machine Translation evaluation which has been demonstrated to have high levels of correlation with human judgments of translation quality, significantly outperforming the more commonly used Bleu metric. It is one of several automatic metrics used in this year's shared task within the ACL WMT-07 workshop. This paper recaps the technical details underlying the metric and describes recent improvements in the metric. The latest release includes improved metric parameters and extends the metric to support evaluation of MT output in Spanish, French and German, in addition to English.
1,384 citations
1 Jun 2014
TL;DR: The results of the WMT14 shared tasks, which included a standard news translation task, a separate medical translationtask, a task for run-time estimation of machine translation quality, and a metrics task, are presented.
Abstract: This paper presents the results of the WMT14 shared tasks, which included a standard news translation task, a separate medical translation task, a task for run-time estimation of machine translation quality, and a metrics task. This year, 143 machine translation systems from 23 institutions were submitted to the ten translation directions in the standard translation task. An additional 6 anonymized systems were included, and were then evaluated both automatically and manually. The quality estimation task had four subtasks, with a total of 10 teams, submitting 57 entries
927 citations
Performance Metrics
| Year | Papers |
|---|---|
| 2015 | 60 |
| 2014 | 65 |
| 2013 | 66 |
| 2012 | 61 |
| 2011 | 70 |
| 2010 | 63 |