Top 60 papers presented at Workshop on Statistical Machine Translation in 2015

Showing papers presented at "Workshop on Statistical Machine Translation in 2015"

Proceedings Article•10.18653/V1/W15-3049•

chrF: character n-gram F-score for automatic MT evaluation

[...]

Maja Popović¹•Institutions (1)

1 Sep 2015

TL;DR: The proposed use of character n-gram F-score for automatic evaluation of machine translation output shows very promising results, especially for the CHRF3 score – for translation from English, this variant showed the highest segment-level correlations outperforming even the best metrics on the WMT14 shared evaluation task.

...read moreread less

Abstract: We propose the use of character n-gram F-score for automatic evaluation of machine translation output. Character ngrams have already been used as a part of more complex metrics, but their individual potential has not been investigated yet. We report system-level correlations with human rankings for 6-gram F1-score (CHRF) on the WMT12, WMT13 and WMT14 data as well as segment-level correlation for 6gram F1 (CHRF) and F3-scores (CHRF3) on WMT14 data for all available target languages. The results are very promising, especially for the CHRF3 score – for translation from English, this variant showed the highest segment-level correlations outperforming even the best metrics on the WMT14 shared evaluation task.

...read moreread less

1,392 citations

Proceedings Article•10.18653/V1/W15-3001•

Findings of the 2015 Workshop on Statistical Machine Translation

[...]

Ondřej Bojar¹, Rajen Chatterjee², Christian Federmann², Barry Haddow, Matthias Huck, Chris Hokamp³, Philipp Koehn, Varvara Logacheva³, Christof Monz⁴, Matteo Negri⁵, Matt Post⁶, Carolina Scarton³, Lucia Specia³, Marco Turchi⁵ - Show less +10 more•Institutions (6)

Charles University in Prague¹, University of Edinburgh², University of Sheffield³, University of Amsterdam⁴, fondazione bruno kessler⁵, Johns Hopkins University⁶

1 Sep 2015

TL;DR: The WMT15 shared task as discussed by the authors included a standard news translation task, a metrics task, tuning task, and a task for run-time estimation of machine translation quality, and an automatic post-editing task.

...read moreread less

Abstract: This paper presents the results of the WMT15 shared tasks, which included a standard news translation task, a metrics task, a tuning task, a task for run-time estimation of machine translation quality, and an automatic post-editing task. This year, 68 machine translation systems from 24 institutions were submitted to the ten translation directions in the standard translation task. An additional 7 anonymized systems were included, and were then evaluated both automatically and manually. The quality estimation task had three subtasks, with a total of 10 teams, submitting 34 entries. The pilot automatic postediting task had a total of 4 teams, submitting 7 entries.

...read moreread less

379 citations

Proceedings Article•10.18653/V1/W15-3031•

Results of the WMT15 Metrics Shared Task

[...]

Miloš Stanojević, Amir Kamran¹, Philipp Koehn, Ondřej Bojar²•Institutions (2)

University of Amsterdam¹, Charles University in Prague²

1 Sep 2015

TL;DR: This paper presents the results of the WMT15 Metrics Shared Task, which asked participants of this task to score the outputs of the MT systems involved in the W MT15 Shared Translation Task to evaluate system level correlation and segment level correlation.

...read moreread less

Abstract: This paper presents the results of the WMT15 Metrics Shared Task. We asked participants of this task to score the outputs of the MT systems involved in the WMT15 Shared Translation Task. We collected scores of 46 metrics from 11 research groups. In addition to that, we computed scores of 7 standard metrics (BLEU, SentBLEU, NIST, WER, PER, TER and CDER) as baselines. The collected scores were evaluated in terms of system level correlation (how well each metric’s scores correlate with WMT15 official manual ranking of systems) and in terms of segment level correlation (how often a metric agrees with humans in comparing two translations of a particular sentence).

...read moreread less

162 citations

Proceedings Article•

Proceedings of the Tenth Workshop on Statistical Machine Translation

[...]

Ondřej Bojar¹, Rajan Chatterjee, Christian Federmann², Barry Haddow², Chris Hokamp³, Matthias Huck², Varvara Logacheva⁴, Pavel Pecina¹ - Show less +4 more•Institutions (4)

Charles University in Prague¹, University of Edinburgh², Dublin City University³, University of Sheffield⁴

1 Jan 2015

43 citations

Proceedings Article•10.18653/V1/W15-3025•

The FBK Participation in the WMT15 Automatic Post-editing Shared Task

[...]

Rajen Chatterjee¹, Marco Turchi², Matteo Negri²•Institutions (2)

University of Edinburgh¹, fondazione bruno kessler²

1 Sep 2015

TL;DR: This paper describes the “FBK EnglishSpanish Automatic Post-editing (APE)” systems submitted to the APE shared task at the WMT 2015 and introduces some novel task-specific dense features through which improvements over the default setup of these approaches are observed.

...read moreread less

Abstract: In this paper, we describe the “FBK EnglishSpanish Automatic Post-editing (APE)” systems submitted to the APE shared task at the WMT 2015. We explore the most widely used statistical APE technique (monolingual) and its most significant variant (context-aware). In this exploration, we introduce some novel task-specific dense features through which we observe improvements over the default setup of these approaches. We show these features are useful to prune the phrase table in order to remove unreliable rules and help the decoder to select useful translation options during decoding. Our primary APE system submitted at this shared task performs significantly better than the standard APE baseline.

...read moreread less

35 citations

Proceedings Article•10.18653/V1/W15-3050•

BEER 1.1: ILLC UvA submission to metrics and tuning task

[...]

Miloš Stanojević¹, Khalil Sima'an¹•Institutions (1)

University of Amsterdam¹

1 Sep 2015

TL;DR: The main changes introduced this year are: extending the learning-to-rank trained sentence level metric to the corpus level, incorporating syntactic ingredients based on dependency trees, and a technique for finding parameters of BEER that avoid “gaming of the metric” during tuning.

...read moreread less

Abstract: We describe the submissions of ILLC UvA to the metrics and tuning tasks on WMT15. Both submissions are based on the BEER evaluation metric originally presented on WMT14 (Stanojevic and Sima’an, 2014a). The main changes introduced this year are: (i) extending the learning-to-rank trained sentence level metric to the corpus level (but still decomposable to sentence level), (ii) incorporating syntactic ingredients based on dependency trees, and (iii) a technique for finding parameters of BEER that avoid “gaming of the metric” during tuning.

...read moreread less

28 citations

Proceedings Article•10.18653/V1/W15-3041•

SHEF-NN: Translation Quality Estimation with Neural Networks

[...]

Kashif Shah¹, Varvara Logacheva¹, Gustavo Paetzold¹, Frédéric Blain¹, Daniel Beck¹, Fethi Bougares², Lucia Specia³ - Show less +3 more•Institutions (3)

University of Sheffield¹, University of Maine², Dublin City University³

1 Sep 2015

TL;DR: The authors' systems outperform the baseline as well as many other submissions for Tasks 1 and 2 of the WMT15 Shared Task on Quality Estimation and the best performing system (SHEF-W2V) only uses features learned in an unsupervised fashion.

...read moreread less

Abstract: We describe our systems for Tasks 1 and 2 of the WMT15 Shared Task on Quality Estimation. Our submissions use (i) a continuous space language model to extract additional features for Task 1 (SHEFGP, SHEF-SVM), (ii) a continuous bagof-words model to produce word embeddings as features for Task 2 (SHEF-W2V) and (iii) a combination of features produced by QuEst++ and a feature produced with word embedding models (SHEFQuEst++). Our systems outperform the baseline as well as many other submissions. The results are especially encouraging for Task 2, where our best performing system (SHEF-W2V) only uses features learned in an unsupervised fashion.

...read moreread less

26 citations

Proceedings Article•10.18653/V1/W15-3013•

The Edinburgh/JHU Phrase-based Machine Translation Systems for WMT~2015

[...]

Barry Haddow, Matthias Huck, Alexandra Birch, Nikolay Bogoychev¹, Philipp Koehn - Show less +1 more•Institutions (1)

University of Edinburgh¹

1 Sep 2015

TL;DR: This paper set up phrase-based statistical machine translation systems for all ten language pairs of this year’s evaluation campaign, which are English paired with Czech, Finnish, French, German, and Russian in both translation directions.

...read moreread less

Abstract: This paper describes the submission of the University of Edinburgh and the Johns Hopkins University for the shared translation task of the EMNLP 2015 Tenth Workshop on Statistical Machine Translation (WMT 2015). We set up phrase-based statistical machine translation systems for all ten language pairs of this year’s evaluation campaign, which are English paired with Czech, Finnish, French, German, and Russian in both translation directions. Novel research directions we investigated include: neural network language models and bilingual neural network language models, a comprehensive use of word classes, and sparse lexicalized reordering features.

...read moreread less

26 citations

Proceedings Article•10.18653/V1/W15-3059•

How do Humans Evaluate Machine Translation

[...]

Francisco Guzmán¹, Ahmed Abdelali¹, Irina Temnikova¹, Hassan Sajjad¹, Stephan Vogel¹ - Show less +1 more•Institutions (1)

Qatar Foundation¹

1 Sep 2015

TL;DR: This paper takes a closer look at the MT evaluation process from a glass-box perspective using eye-tracking and suggests that to have consistent and cost effective MT evaluations, it is better to use monolinguals with only target language information.

...read moreread less

Abstract: In this paper, we take a closer look at the MT evaluation process from a glass-box perspective using eye-tracking. We analyze two aspects of the evaluation task ‐ the background of evaluators (monolingual or bilingual) and the sources of information available, and we evaluate them using time and consistency as criteria. Our findings show that monolinguals are slower but more consistent than bilinguals, especially when only target language information is available. When exposed to various sources of information, evaluators in general take more time and in the case of monolinguals, there is a drop in consistency. Our findings suggest that to have consistent and cost effective MT evaluations, it is better to use monolinguals with only target language information.

...read moreread less

20 citations

Proceedings Article•10.18653/V1/W15-3047•

Machine Translation Evaluation using Recurrent Neural Networks

[...]

Rohit Gupta¹, Constantin Orasan¹, Josef van Genabith²•Institutions (2)

University of Wolverhampton¹, German Research Centre for Artificial Intelligence²

1 Sep 2015

TL;DR: A metric based on dense vector spaces and Long Short Term Memory networks, which are types of Recurrent Neural Networks (RNNs), is submitted in the WMT-15 metrics task and is the best performing metric overall according to Spearman and Pearson and second best according to Pearson (TrueSkill) system level correlation.

...read moreread less

Abstract: This paper presents our metric (UoWLSTM) submitted in the WMT-15 metrics task. Many state-of-the-art Machine Translation (MT) evaluation metrics are complex, involve extensive external resources (e.g. for paraphrasing) and require tuning to achieve the best results. We use a metric based on dense vector spaces and Long Short Term Memory (LSTM) networks, which are types of Recurrent Neural Networks (RNNs). For WMT15 our new metric is the best performing metric overall according to Spearman and Pearson (Pre-TrueSkill) and second best according to Pearson (TrueSkill) system level correlation.

...read moreread less

20 citations

Proceedings Article•10.18653/V1/W15-3026•

USAAR-SAPE: An English--Spanish Statistical Automatic Post-Editing System

[...]

Santanu Pal¹, Mihaela Vela¹, Sudip Kumar Naskar², Josef van Genabith³•Institutions (3)

Saarland University¹, Jadavpur University², German Research Centre for Artificial Intelligence³

1 Sep 2015

TL;DR: The USAAR-SAPE English‐ Spanish Automatic Post-Editing (APE) system submitted to the APE Task organized in the Workshop on Statistical Machine Translation (WMT) in 2015 was able to improve upon the baseline MT system output by incorporating Phrase-Based Statistical MT (PBSMT) technique into the monolingual Statistical APE task (SAPE).

...read moreread less

Abstract: We describe the USAAR-SAPE English‐ Spanish Automatic Post-Editing (APE) system submitted to the APE Task organized in the Workshop on Statistical Machine Translation (WMT) in 2015. Our system was able to improve upon the baseline MT system output by incorporating Phrase-Based Statistical MT (PBSMT) technique into the monolingual Statistical APE task (SAPE). The reported final submission crucially involves hybrid word alignment. The SAPE system takes raw Spanish Machine Translation (MT) output provided by the shared task organizers and produces post-edited Spanish text. The parallel data consist of English Text, raw machine translated Spanish output, and their corresponding manually post-edited versions. The major goal of the task is to reduce the post-editing effort by improving the quality of the MT output in terms of fluency and adequacy.

...read moreread less

Proceedings Article•10.18653/V1/W15-3052•

LeBLEU: N-gram-based Translation Evaluation Score for Morphologically Complex Languages

[...]

Sami Virpioja¹, Stig-Arne Grönroos¹•Institutions (1)

Helsinki University of Technology¹

1 Sep 2015

TL;DR: The results on WMT data sets show that fuzzy n-gram matching improves correlations to human evaluation especially for highly compounding languages.

...read moreread less

Abstract: This paper describes the LeBLEU evaluation score for machine translation, submitted to WMT15 Metrics Shared Task. LeBLEU extends the popular BLEU score to consider fuzzy matches between word n-grams. While there are several variants of BLEU that allow to non-exact matches between words either by character-based distance measures or morphological preprocessing, none of them use fuzzy comparison between longer chunks of text. The results on WMT data sets show that fuzzy n-gram matching improves correlations to human evaluation especially for highly compounding languages.

...read moreread less

Proceedings Article•10.18653/V1/W15-3051•

Predicting Machine Translation Adequacy with Document Embeddings

[...]

Mihaela Vela¹, Liling Tan¹•Institutions (1)

Saarland University¹

1 Sep 2015

TL;DR: The approach presented here is learning a Bayesian Ridge Regressor using document skip-gram embeddings in order to automatically evaluate Machine Translation (MT) output by predicting semantic adequacy scores.

...read moreread less

Abstract: This paper describes USAAR’s submission to the the metrics shared task of the Workshop on Statistical Machine Translation (WMT) in 2015. The goal of our submission is to take advantage of the semantic overlap between hypothesis and reference translation for predicting MT output adequacy using language independent document embeddings. The approach presented here is learning a Bayesian Ridge Regressor using document skip-gram embeddings in order to automatically evaluate Machine Translation (MT) output by predicting semantic adequacy scores. The evaluation of our submission ‐ measured by the correlation with human judgements ‐ shows promising results on system-level scores.

...read moreread less

Proceedings Article•10.18653/V1/W15-3036•

UAlacant word-level machine translation quality estimation system at WMT 2015

[...]

Miquel Esplà-Gomis¹, Felipe Sánchez-Martínez¹, Mikel L. Forcada¹•Institutions (1)

University of Alicante¹

1 Sep 2015

TL;DR: The Universitat d’Alacant submissions for the machine translation quality estimation (MTQE) shared task in WMT 2015 is described, where they participated in the wordlevel MTQE sub-task.

...read moreread less

Abstract: This paper describes the Universitat d’Alacant submissions (labelled as UAlacant) for the machine translation quality estimation (MTQE) shared task in WMT 2015, where we participated in the wordlevel MTQE sub-task. The method we used to produce our submissions uses external sources of bilingual information as a black box to spot sub-segment correspondences between a source segmentS and the translation hypothesisT produced by a machine translation system. This is done by segmenting bothS andT into overlapping subsegments of variable length and translating them in both translation directions, using the available sources of bilingual information on the fly. For our submissions, two sources of bilingual information were used: machine translation (Apertium and Google Translate) and the bilingual concordancer Reverso Context. After obtaining the subsegment correspondences, a collection of features is extracted from them, which are then used by a binary classifer to obtain the final “GOOD” or “BAD” word-level quality labels. We prepared two submissions for this year’s edition of WMT 2015: one using the features produced by our system, and one combining them with the baseline features published by the organisers of the task, which were ranked third and first for the sub-task, respectively.

...read moreread less

Proceedings Article•10.18653/V1/W15-3005•

ParFDA for Fast Deployment of Accurate Statistical Machine Translation Systems, Benchmarks, and Statistics

[...]

Ergun Bicici¹, Qun Liu¹, Andy Way¹•Institutions (1)

Dublin City University¹

17 Sep 2015

TL;DR: ParFDA is a parallel implementation of feature decay algorithms (FDA) developed for fast deploy and results close to the top with an average of 3.176 BLEU points difference using significantly less resources for building SMT systems.

...read moreread less

Abstract: We build parallel FDA5 (ParFDA) Moses statistical machine translation (SMT) systems for all language pairs in the workshop on statistical machine translation (Bojar et al., 2015) (WMT15) translation task and obtain results close to the top with an average of 3.176 BLEU points difference using significantly less resources for building SMT systems. ParFDA is a parallel implementation of feature decay algorithms (FDA) developed for fast deploy

...read moreread less

Proceedings Article•10.18653/V1/W15-3017•

UdS-Sant: English--German Hybrid Machine Translation System

[...]

Santanu Pal¹, Sudip Kumar Naskar², Josef van Genabith³•Institutions (3)

Saarland University¹, Jadavpur University², German Research Centre for Artificial Intelligence³

1 Sep 2015

TL;DR: This paper describes the UdS-Sant English‐German Hybrid Machine Translation system submitted to the Translation Task organized in the Workshop on Statistical Machine Translation (WMT) 2015 and brings improvements over the baseline system by incorporating additional knowledge such as extracted bilingual named entities and bilingual phrase pairs induced from example-based methods.

...read moreread less

Abstract: This paper describes the UdS-Sant English‐German Hybrid Machine Translation (MT) system submitted to the Translation Task organized in the Workshop on Statistical Machine Translation (WMT) 2015. Our proposed hybrid system brings improvements over the baseline system by incorporating additional knowledge such as extracted bilingual named entities and bilingual phrase pairs induced from example-based methods. The reported final submission is the result of a hybrid system obtained from confusion network based system combination that combines the best performance of each individual system in a multi-engine pipeline.

...read moreread less

Proceedings Article•10.18653/V1/W15-3043•

UGENT-LT3 SCATE System for Machine Translation Quality Estimation

[...]

Arda Tezcan¹, Veronique Hoste¹, Bart Desmet¹, Lieve Macken¹•Institutions (1)

Ghent University¹

1 Sep 2015

TL;DR: This paper describes the submission of the UGENT-LT3 SCATE system to the WMT15 Shared Task on Quality Estimation (QE), viz.

...read moreread less

Abstract: This paper describes the submission of the UGENT-LT3 SCATE system to the WMT15 Shared Task on Quality Estimation (QE), viz. English-Spanish word and sentence-level QE. We conceived QE as a supervised Machine Learning (ML) problem and designed additional features and combined these with the baseline feature set to estimate quality. The sentence-level QE system re-uses the word level predictions of the word-level QE system. We experimented with different learning methods and observe improvements over the baseline system for wordlevel QE with the use of the new features and by combining learning methods into ensembles. For sentence-level QE we show that using a single feature based on word-level predictions can perform better than the baseline system and using this in combination with additional features led to further improvements in performance.

...read moreread less

Proceedings Article•10.18653/V1/W15-3022•

Abu-MaTran at WMT 2015 Translation Task: Morphological Segmentation and Web Crawling

[...]

Raphael Rubino¹, Tommi A. Pirinen¹, Miquel Esplà-Gomis², Nikola Ljubešić³, Sergio Ortiz Rojas, Vassilis Papavassiliou¹, Prokopis Prokopidis, Antonio Toral¹ - Show less +4 more•Institutions (3)

Dublin City University¹, University of Alicante², University of Zagreb³

1 Sep 2015

TL;DR: This paper presents the machine translation systems submitted by the Abu-MaTran project for the Finnish‐English language pair at the WMT 2015 translation task, which are the top performing English-to-Finnish unconstrained (all automatic metrics) and constrained (BLEU), and Finnish- to-English constrained (TER) systems.

...read moreread less

Abstract: This paper presents the machine translation systems submitted by the Abu-MaTran project for the Finnish‐English language pair at the WMT 2015 translation task. We tackle the lack of resources and complex morphology of the Finnish language by (i) crawling parallel and monolingual data from the Web and (ii) applying rule-based and unsupervised methods for morphological segmentation. Several statistical machine translation approaches are evaluated and then combined to obtain our final submissions, which are the top performing English-to-Finnish unconstrained (all automatic metrics) and constrained (BLEU), and Finnish-to-English constrained (TER) systems.

...read moreread less

Proceedings Article•10.18653/V1/W15-3003•

Data Selection With Fewer Words

[...]

Amittai Axelrod¹, Philip Resnik¹, Xiaodong He², Mari Ostendorf³•Institutions (3)

University of Maryland, College Park¹, Microsoft², University of Washington³

1 Sep 2015

TL;DR: This work presents a method that improves data selection by combining a hybrid word/part-of-speech representation for corpora, with the idea of distinguishing between rare and frequent events.

...read moreread less

Abstract: We present a method that improves data selection by combining a hybrid word/part-of-speech representation for corpora, with the idea of distinguishing between rare and frequent events. We validate our approach using data selection for machine translation, and show that it maintains or improves BLEU and TER translation scores while substantially improving vocabulary coverage and reducing data selection model size. Paradoxically, the coverage improvement is achieved by abstracting away over 97% of the total training corpus vocabulary using simple part-of-speech tags during the data selection process.

...read moreread less

Proceedings Article•10.18653/V1/W15-3035•

Referential Translation Machines for Predicting Translation Quality and Related Statistics

[...]

Ergun Bicici¹, Qun Liu¹, Andy Way¹•Institutions (1)

Dublin City University¹

17 Sep 2015

TL;DR: It is shown that referential translation machines pioneer a language independent approach to all similarity tasks and remove the need to access any task or domain specific information or resource.

...read moreread less

Abstract: We use referential translation machines (RTMs) for predicting translation performance. RTMs pioneer a language independent approach to all similarity tasks and remove the need to access any task or domain specific information or resource. We improve our RTM models with the

...read moreread less

Proceedings Article•10.18653/V1/W15-3027•

Why Predicting Post-Edition is so Hard? Failure Analysis of LIMSI Submission to the APE Shared Task

[...]

Guillaume Wisniewski¹, Nicolas Pécheux², François Yvon³•Institutions (3)

University of Paris-Sud¹, Université Paris-Saclay², Centre national de la recherche scientifique³

1 Sep 2015

TL;DR: It is shown, by carefully analyzing the failure of the two systems submitted by LIMSI to the WMT’15 Shared Task on Automatic Post-Editing, that this counterperformance mainly results from the inconsistency in the annotations.

...read moreread less

Abstract: This paper describes the two systems submitted by LIMSI to the WMT’15 Shared Task on Automatic Post-Editing. The first one relies on a reformulation of the APE task as a Machine Translation task; the second implements a simple rule-based approach. Neither of these two systems manage to improve the automatic translation. We show, by carefully analyzing the failure of our systems that this counterperformance mainly results from the inconsistency in the annotations.

...read moreread less

Proceedings Article•10.18653/V1/W15-3018•

The RWTH Aachen German-English Machine Translation System for WMT 2015

[...]

Jan-Thorsten Peter¹, Farzad Toutounchi, Joern Wuebker¹, Hermann Ney¹•Institutions (1)

RWTH Aachen University¹

1 Sep 2015

TL;DR: This paper describes the statistical machine translation system developed at RWTH Aachen University for the German!English translation task of the EMNLP 2015 Tenth Workshop on Statistical Machine Translation (WMT 2015).

...read moreread less

Abstract: This paper describes the statistical machine translation system developed at RWTH Aachen University for the German!English translation task of the EMNLP 2015 Tenth Workshop on Statistical Machine Translation (WMT 2015). A phrase-based machine translation system was applied and augmented with hierarchical phrase reordering and word class language models. Further, we ran discriminative maximum expected BLEU training for our system. In addition, we utilized multiple feed-forward neural network language and translation models and a recurrent neural network language model for reranking.

...read moreread less

Proceedings Article•10.18653/V1/W15-3016•

LIMSI$@$WMT'15 : Translation Task

[...]

Benjamin Marie, Alexandre Allauzen¹, Franck Burlot, Quoc-Khanh Do, Julia Ive¹, Elena Knyazeva, Matthieu Labeau, Thomas Lavergne², Kevin Löser, Nicolas Pécheux, François Yvon - Show less +7 more•Institutions (2)

Université Paris-Saclay¹, Franche Comté Électronique Mécanique Thermique et Optique Sciences et Technologies²

1 Sep 2015

TL;DR: LIMSI’s submissions to the shared WMT’15 translation task are described, including a tailored normalization of Russian to translate into English, and a two-step process to translate first into simplified Russian, followed by a conversion into inflected Russian.

...read moreread less

Abstract: This paper describes LIMSI’s submissions to the shared WMT’15 translation task. We report results for French-English, Russian-English in both directions, as well as for Finnish-into-English. Our submissions use NCODE and MOSES along with continuous space translation models in a post-processing step. The main novelties of this year’s participation are the following: for Russian-English, we investigate a tailored normalization of Russian to translate into English, and a two-step process to translate first into simplified Russian, followed by a conversion into inflected Russian. For French-English, the challenge is domain adaptation, for which only monolingual corpora are available. Finally, for the Finnish-to-English task, we explore unsupervised morphological segmentation to reduce the sparsity of data induced by the rich morphology on the Finnish side.

...read moreread less

Proceedings Article•10.18653/V1/W15-3038•

LORIA System for the WMT15 Quality Estimation Shared Task

[...]

David Langlois

17 Sep 2015

TL;DR: This paper proposes to increase the size of the training corpus by using the post-edited and reference corpora during the training step and performs a linear regression of the feature space against scores in the range [0..1].

...read moreread less

Abstract: We describe our system for WMT2015 Shared Task on Quality Estimation, task 1, sentence-level prediction of post-edition effort. We use baseline features, Latent Semantic Indexing based features and features based on pseudo-references. SVM algorithm allows to estimate the linear regression between the features vectors and the HTER score. We use a selection algorithm in order to put aside needless features. Our best system leads to a performance in terms of Mean Absolute Error equal to 13.34 on official test while the official baseline system leads to a performance equal to 14.82.

...read moreread less

Proceedings Article•10.18653/V1/W15-3048•

Alignment-based sense selection in METEOR and the RATATOUILLE recipe

[...]

Benjamin Marie, Marianna Apidianaki

1 Sep 2015

TL;DR: It is shown that context-sensitive synonym selection increases the correlation of the Meteor metric with human judgments of translation quality on the WMT14 data.

...read moreread less

Abstract: This paper describes Meteor-WSD and RATATOUILLE, the LIMSI submissions to the WMT15 metrics shared task. MeteorWSD extends synonym mapping to languages other than English based on alignments and gives credit to semantically adequate translations in context. We show that context-sensitive synonym selection increases the correlation of the Meteor metric with human judgments of translation quality on the WMT14 data. RATATOUILLE combines MeteorWSD with nine other metrics for evaluation and outperforms the best metric (BEER) involved in its computation.

...read moreread less

Proceedings Article•10.18653/V1/W15-3030•

ListNet-based MT Rescoring

[...]

Jan Niehues¹, Quoc-Khanh Do, Alexandre Allauzen², Alex Waibel¹•Institutions (2)

Karlsruhe Institute of Technology¹, Université Paris-Saclay²

1 Sep 2015

TL;DR: This work presents a new technique to train the log-linear model based on the ListNet algorithm that scales to many features, considers the whole list and not single entries during learning and can also be applied to more complex models than a log- linear combination.

...read moreread less

Abstract: The log-linear combination of different features is an important component of SMT systems. It allows for the easy integartion of models into the system and is used during decoding as well as for nbest list rescoring. With the recent success of more complex models like neural network-based translation models, n-best list rescoring attracts again more attention. In this work, we present a new technique to train the log-linear model based on the ListNet algorithm. This technique scales to many features, considers the whole list and not single entries during learning and can also be applied to more complex models than a log-linear combination. Using the new learning approach, we improve the translation quality of a largescale system by 0.8 BLEU points during rescoring and generate translations which are up to 0.3 BLEU points better than other learning techniques such as MERT or MIRA.

...read moreread less

Proceedings Article•10.18653/V1/W15-3021•

Morphological Segmentation and OPUS for Finnish-English Machine Translation

[...]

Jörg Tiedemann¹, Filip Ginter², Jenna Kanerva²•Institutions (2)

Uppsala University¹, University of Turku²

1 Sep 2015

TL;DR: B baseline systems for Finnish-English and English-Finnish machine translation using standard phrasebased and factored models including morphological features are described and the effectiveness of morphological pre-processing of Finnish is demonstrated.

...read moreread less

Abstract: This paper describes baseline systems for Finnish-English and English-Finnish machine translation using standard phrasebased and factored models including morphological features. We experiment with compound splitting and morphological segmentation and study the effect of adding noisy out-of-domain data to the parallel and the monolingual training data. Our results stress the importance of training data and demonstrate the effectiveness of morphological pre-processing of Finnish.

...read moreread less

Proceedings Article•10.18653/V1/W15-3032•

Results of the WMT15 Tuning Shared Task

[...]

Miloš Stanojević, Amir Kamran¹, Ondřej Bojar²•Institutions (2)

University of Amsterdam¹, Charles University in Prague²

1 Sep 2015

TL;DR: This paper presents the results of the WMT15 Tuning Shared Task, which provided the participants of this task with a complete machine translation system and asked them to tune its internal parameters (feature weights).

...read moreread less

Abstract: This paper presents the results of the WMT15 Tuning Shared Task. We provided the participants of this task with a complete machine translation system and asked them to tune its internal parameters (feature weights). The tuned systems were used to translate the test set and the outputs were manually ranked for translation quality. We received 4 submissions in the English-Czech and 6 in the Czech-English translation direction. In addition, we ran 3 baseline setups, tuning the parameters with standard optimizers for BLEU score.

...read moreread less

Proceedings Article•10.18653/V1/W15-3004•

DFKI's experimental hybrid MT system for WMT 2015

[...]

Eleftherios Avramidis¹, Maja Popović², Aljoscha Burchardt¹•Institutions (2)

German Research Centre for Artificial Intelligence¹, Humboldt University of Berlin²

1 Sep 2015

TL;DR: DFKI participated in the shared translation task of WMT 2015 with the GermanEnglish language pair in each translation direction using an experimental hybrid system based on three systems: a statistical Moses system, a commercial rule-based system, and a serial coupling of the two.

...read moreread less

Abstract: DFKI participated in the shared translation task of WMT 2015 with the GermanEnglish language pair in each translation direction. The submissions were generated using an experimental hybrid system based on three systems: a statistical Moses system, a commercial rule-based system, and a serial coupling of the two where the output of the rule-based system is further translated by Moses trained on parallel text consisting of the rule-based output and the original target language. The outputs of three systems are combined using two methods: (a) an empirical selection mechanism based on grammatical features (primary submission) and (b) IBM1 models based on POS 4-grams (contrastive submission).

...read moreread less

Proceedings Article•10.18653/V1/W15-3060•

Local System Voting Feature for Machine Translation System Combination

[...]

Markus Freitag¹, Jan-Thorsten Peter¹, Stephan Peitz¹, Minwei Feng², Hermann Ney¹ - Show less +1 more•Institutions (2)

RWTH Aachen University¹, IBM²

1 Sep 2015

TL;DR: In this paper, the authors enhance the traditional confusion network system combination approach with an additional model trained by a neural network, which gives system combination the option to prefer other systems at different word positions even for the same sentence.

...read moreread less

Abstract: In this paper, we enhance the traditional confusion network system combination approach with an additional model trained by a neural network. This work is motivated by the fact that the commonly used binary system voting models only assign each input system a global weight which is responsible for the global impact of each input system on all translations. This prevents individual systems with low system weights from having influence on the system combination output, although in some situations this could be helpful. Further, words which have only been seen by one or few systems rarely have a chance of being present in the combined output. We train a local system voting model by a neural network which is based on the words themselves and the combinatorial occurrences of the different system outputs. This gives system combination the option to prefer other systems at different word positions even for the same sentence.

...read moreread less