Top 65 papers presented at Workshop on Statistical Machine Translation in 2014

Showing papers presented at "Workshop on Statistical Machine Translation in 2014"

Proceedings Article•10.3115/V1/W14-3348•

Meteor Universal: Language Specific Translation Evaluation for Any Target Language

[...]

Michael Denkowski¹, Alon Lavie¹•Institutions (1)

1 Jun 2014

TL;DR: Meteor Universal brings language specific evaluation to previously unsupported target languages by automatically extracting linguistic resources from the bitext used to train MT systems and using a universal parameter set learned from pooling human judgments of translation quality from several language directions.

...read moreread less

Abstract: This paper describes Meteor Universal, released for the 2014 ACL Workshop on Statistical Machine Translation. Meteor Universal brings language specific evaluation to previously unsupported target languages by (1) automatically extracting linguistic resources (paraphrase tables and function word lists) from the bitext used to train MT systems and (2) using a universal parameter set learned from pooling human judgments of translation quality from several language directions. Meteor Universal is shown to significantly outperform baseline BLEU on two new languages, Russian (WMT13) and Hindi (WMT14).

...read moreread less

2,582 citations

Proceedings Article•10.3115/V1/W14-3302•

Findings of the 2014 Workshop on Statistical Machine Translation

[...]

Ondrej Bojar¹, Christian Buck², Christian Federmann², Barry Haddow, Philipp Koehn, Johannes Leveling³, Christof Monz⁴, Pavel Pecina¹, Matt Post⁵, Herve Saint-Amand², Radu Soricut⁶, Lucia Specia⁷, Aleš Tamchyna¹ - Show less +9 more•Institutions (7)

Charles University in Prague¹, University of Edinburgh², Dublin City University³, University of Amsterdam⁴, Johns Hopkins University⁵, Google⁶, University of Sheffield⁷

1 Jun 2014

TL;DR: The results of the WMT14 shared tasks, which included a standard news translation task, a separate medical translationtask, a task for run-time estimation of machine translation quality, and a metrics task, are presented.

...read moreread less

Abstract: This paper presents the results of the WMT14 shared tasks, which included a standard news translation task, a separate medical translation task, a task for run-time estimation of machine translation quality, and a metrics task. This year, 143 machine translation systems from 23 institutions were submitted to the ten translation directions in the standard translation task. An additional 6 anonymized systems were included, and were then evaluated both automatically and manually. The quality estimation task had four subtasks, with a total of 10 teams, submitting 57 entries

...read moreread less

927 citations

Proceedings Article•10.3115/V1/W14-3346•

A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU

[...]

Boxing Chen¹, Colin Cherry²•Institutions (2)

National Research Council¹, University of Alberta²

1 Jun 2014

TL;DR: Three of them are first proposed in this paper and they correlate better with human judgments on the sentence-level than other smoothing techniques, and the performance of using them in statistical machine transla- tion tuning is compared.

...read moreread less

Abstract: BLEU is the de facto standard machine translation (MT) evaluation metric. How- ever, because BLEU computes a geo- metric mean of n-gram precisions, it of- ten correlates poorly with human judg- ment on the sentence-level. There- fore, several smoothing techniques have been proposed. This paper systemati- cally compares 7 smoothing techniques for sentence-level BLEU. Three of them are first proposed in this paper, and they correlate better with human judgments on the sentence-level than other smoothing techniques. Moreover, we also compare the performance of using the 7 smoothing techniques in statistical machine transla- tion tuning.

...read moreread less

309 citations

Proceedings Article•10.3115/V1/W14-3340•

FBK-UPV-UEdin participation in the WMT14 Quality Estimation shared-task

[...]

José G. C. de Souza¹, Jesús González-Rubio², Christian Buck³, Marco Turchi⁴, Matteo Negri⁵ - Show less +1 more•Institutions (5)

Dublin City University¹, Polytechnic University of Valencia², University of Edinburgh³, University of Sheffield⁴, fondazione bruno kessler⁵

1 Jun 2014

TL;DR: The joint submission of Fondazione Bruno Kessler, Universitat Politde Val` encia and University of Edinburgh to the Quality Estimation tasks of the Workshop on Statistical Machine Translation 2014 is described.

...read moreread less

Abstract: This paper describes the joint submission of Fondazione Bruno Kessler, Universitat Politde Val` encia and University of Edinburgh to the Quality Estimation tasks of the Workshop on Statistical Machine Translation 2014. We present our submis- sions for Task 1.2, 1.3 and 2. Our systems ranked first for Task 1.2 and for the Binary and Level1 settings in Task 2.

...read moreread less

56 citations

Proceedings Article•10.3115/V1/W14-3311•

Phrasal: A Toolkit for New Directions in Statistical Machine Translation

[...]

Spence Green¹, Daniel Cer¹, Christopher D. Manning¹•Institutions (1)

Stanford University¹

1 Jun 2014

TL;DR: A new version of Phrasal, an open-source toolkit for statistical phrasebased machine translation, is presented, which includes features that support emerging research trends such as tuning with large feature sets, and web-based interactive machine translation.

...read moreread less

Abstract: We present a new version of Phrasal, an open-source toolkit for statistical phrasebased machine translation. This revision includes features that support emerging research trends such as (a) tuning with large feature sets, (b) tuning on large datasets like thebitext, and(c)web-basedinteractivemachine translation. A direct comparison with Moses shows favorable results in terms of decoding speed and tuning time.

...read moreread less

50 citations

Proceedings Article•

Edinburghâ€™s Phrase-based Machine Translation Systems for WMT-14

[...]

Nadir Durrani¹, Barry Haddow¹, Philipp Koehn¹, Kenneth Heafield¹•Institutions (1)

University of Edinburgh¹

1 Jun 2014

TL;DR: UEDIN’s phrase-based submissions to the translation and medical translation shared tasks of the 2014 Workshop on Statistical Machine Translation (WMT) are described.

...read moreread less

Abstract: This paper describes the University of Edinburgh’s (UEDIN) phrase-based submissions to the translation and medical translation shared tasks of the 2014 Workshop on Statistical Machine Translation (WMT). We participated in all language pairs. We have improved upon our 2013 system by i) using generalized representations, specifically automatic word clusters for translations out of English, ii) using unsupervised character-based models to translate unknown words in RussianEnglish and Hindi-English pairs, iii) synthesizing Hindi data from closely-related Urdu data, and iv) building huge language on the common crawl corpus.

...read moreread less

41 citations

Proceedings Article•10.3115/V1/W14-3352•

DiscoTK: Using Discourse Structure for Machine Translation Evaluation

[...]

Shafiq Joty¹, Francisco Guzmán², Lluís Màrquez², Preslav Nakov³•Institutions (3)

Qatar Foundation¹, Qatar Computing Research Institute², Cairo University³

1 Jun 2014

TL;DR: Novel automatic metrics for machine translation evaluation that use discourse structure and convolution kernels to compare the discourse tree of an automatic translation with that of the human reference are presented.

...read moreread less

Abstract: We present novel automatic metrics for machine translation evaluation that use discourse structure and convolution kernels to compare the discourse tree of an automatic translation with that of the human reference. We experiment with five transformations and augmentations of a base discourse tree representation based on the rhetorical structure theory, and we combine the kernel scores for each of them into a single score. Finally, we add other metrics from the ASIYA MT evaluation toolkit, and we tune the weights of the combination on actual human judgments. Experiments on the WMT12 and WMT13 metrics shared task datasets show correlation with human judgments that outperforms what the best systems that participated in these years achieved, both at the segment and at the system level.

...read moreread less

40 citations

Proceedings Article•10.18653/V1/W15-3024•

Edinburghâ€™s Syntax-Based Systems at WMT 2014

[...]

Philip Williams, Rico Sennrich, Maria Nadejde¹, Matthias Huck, Eva Hasler, Philipp Koehn - Show less +2 more•Institutions (1)

University of Edinburgh¹

1 Jun 2014

TL;DR: This paper describes the string-to-tree systems built at the University of Edinburgh for the WMT 2014 shared translation task and improved the English-German system through target-side compound splitting, morphosyntactic constraints, and refinements to parse tree annotation.

...read moreread less

Abstract: This paper describes the string-to-tree systems built at the University of Edinburgh for the WMT 2014 shared translation task. We developed systems for English-German, Czech-English, FrenchEnglish, German-English, Hindi-English, and Russian-English. This year we improved our English-German system through target-side compound splitting, morphosyntactic constraints, and refinements to parse tree annotation; we addressed the out-of-vocabulary problem using transliteration for Hindi and Russian and using morphological reduction for Russian; we improved our GermanEnglish system through tree binarization; and we reduced system development time by filtering the tuning sets.

...read moreread less

34 citations

Proceedings Article•10.3115/V1/W14-3308•

The IIT Bombay Hindi-English Translation System at WMT 2014

[...]

Piyush Dungarwal¹, Rajen Chatterjee¹, Abhijit Mishra¹, Anoop Kunchukuttan¹, Ritesh Shah¹, Pushpak Bhattacharyya¹ - Show less +2 more•Institutions (1)

Indian Institute of Technology Bombay¹

1 Jun 2014

TL;DR: It is shown that the use of number, case and Tree Adjoining Grammar information as factors helps to improve English-Hindi translation, primarily by generating morphological inflections correctly.

...read moreread less

Abstract: In this paper, we describe our EnglishHindi and Hindi-English statistical systems submitted to the WMT14 shared task. The core components of our translation systems are phrase based (Hindi-English) and factored (English-Hindi) SMT systems. We show that the use of number, case and Tree Adjoining Grammar information as factors helps to improve English-Hindi translation, primarily by generating morphological inflections correctly. We show improvements to the translation systems using pre-procesing and post-processing components. To overcome the structural divergence between English and Hindi, we preorder the source side sentence to conform to the target language word order. Since parallel corpus is limited, many words are not translated. We translate out-of-vocabulary words and transliterate named entities in a post-processing stage. We also investigate ranking of translations from multiple systems to select the best translation.

...read moreread less

32 citations

Proceedings Article•10.3115/V1/W14-3342•

LIG System for Word Level QE task at WMT14

[...]

Ngoc Quang Luong, Laurent Besacier, Benjamin Lecouteux¹•Institutions (1)

University of Grenoble¹

1 Jun 2014

TL;DR: The Word-level QE system for WMT 2014 shared task on Spanish-English pair is described, optimized by several ways: tuning the classification threshold, combining with WMT 2013 data, and refining using Feature Selection strategy on the development set, before dealing with the test set for submission.

...read moreread less

Abstract: This paper describes our Word-level QE system for WMT 2014 shared task on Spanish-English pair. Compared to WMT 2013, this year's task is different due to the lack of SMT setting information and additional resources. We report how we overcome this challenge to retain most of the important features which performed well last year in our system. Novel features related to the availability of multiple systems output (new point of this year) are also proposed and experimented along with baseline set. The system is optimized by several ways: tuning the classification threshold, combining with WMT 2013 data, and refining using Feature Selection strategy on our development set, before dealing with the test set for submission.

...read moreread less

32 citations

Proceedings Article•

Proceedings of the Ninth Workshop on Statistical Machine Translation

[...]

Ondřej Bojar¹, Christian Buck², Christian Federmann², Barry Haddow², Philipp Koehn², Christof Monz³, Matt Post⁴, Lucia Specia⁵ - Show less +4 more•Institutions (5)

Charles University in Prague¹, University of Edinburgh², University of Amsterdam³, Johns Hopkins University⁴, University of Sheffield⁵

1 Jan 2014

Proceedings Article•10.3115/V1/W14-3310•

EU-BRIDGE MT: Combined Machine Translation

[...]

Markus Freitag¹, Stephan Peitz¹, Joern Wuebker¹, Hermann Ney¹, Matthias Huck, Rico Sennrich, Nadir Durrani², Maria Nadejde², Philip Williams, Philipp Koehn, Teresa Herrmann³, Eunah Cho³, Alex Waibel⁴ - Show less +9 more•Institutions (4)

RWTH Aachen University¹, University of Edinburgh², Karlsruhe Institute of Technology³, Facebook⁴

1 Jun 2014

TL;DR: Three research institutes involved in the EU-BRIDGE project combined their individual machine translation systems and participated with a joint setup in the shared translation task of the evaluation campaign at the ACL 2014 Eighth Workshop on Statistical Machine Translation.

...read moreread less

Abstract: This paper describes one of the collaborative efforts within EU-BRIDGE to further advance the state of the art in machine translation between two European language pairs, German→English and English→German. Three research institutes involved in the EU-BRIDGE project combined their individual machine translation systems and participated with a joint setup in the shared translation task of the evaluation campaign at the ACL 2014 Eighth Workshop on Statistical Machine Translation (WMT 2014). We combined up to nine different machine translation engines via system combination. RWTH Aachen University, the University of Edinburgh, and Karlsruhe Institute of Technology developed several individual systems which serve as system combination input. We devoted special attention to building syntax-based systems and combining them with the phrasebased ones. The joint setups yield empirical gains of up to 1.6 points in BLEU and 1.0 points in TER on the WMT newstest2013 test set compared to the best single systems.

...read moreread less

Proceedings Article•10.3115/V1/W14-3356•

Crowdsourcing High-Quality Parallel Data Extraction from Twitter

[...]

Wang Ling¹, Luís Marujo¹, Chris Dyer¹, Alan W. Black¹, Isabel Trancoso² - Show less +1 more•Institutions (2)

Carnegie Mellon University¹, INESC-ID²

1 Jun 2014

TL;DR: The quality of the crowdsourced corpus is significantly better than existing automatic methods: it obtains an performance comparable to expert annotations when used in MERT tuning of a microblog MT system; and training a parallel sentence classifier with it leads also to improved results.

...read moreread less

Abstract: High-quality parallel data is crucial for a range of multilingual applications, from tuning and evaluating machine translation systems to cross-lingual annotation projection. Unfortunately, automatically obtained parallel data (which is available in relative abundance) tends to be quite noisy. To obtain high-quality parallel data, we introduce a crowdsourcing paradigm in which workers with only basic bilingual proficiency identify translations from an automatically extracted corpus of parallel microblog messages. For less than $350, we obtained over 5000 parallel segments in five language pairs. Evaluated against expert annotations, the quality of the crowdsourced corpus is significantly better than existing automatic methods: it obtains an performance comparable to expert annotations when used in MERT tuning of a microblog MT system; and training a parallel sentence classifier with it leads also to improved results. The crowdsourced corpora will be made available in http://www.cs.cmu.edu/ ~lingwang/microtopia/.

...read moreread less

Proceedings Article•10.3115/V1/W14-3351•

IPA and STOUT: Leveraging Linguistic and Source-based Features for Machine Translation Evaluation

[...]

Meritxell González¹, Alberto Barrón-Cedeño¹, Lluís Màrquez²•Institutions (2)

Polytechnic University of Catalonia¹, Qatar Computing Research Institute²

1 Jun 2014

TL;DR: The two UPC submissions to the WMT14 Metrics Shared Task take advantage of novel metrics that consider linguistic structures, lexical relationships, and semantics to compare both source and reference translation against the candidate translation.

...read moreread less

Abstract: This paper describes the UPC submissions to the WMT14 Metrics Shared Task: UPCIPA and UPC-STOUT. These metrics use a collection of evaluation measures integrated in ASIYA, a toolkit for machine translation evaluation. In addition to some standard metrics, the two submissions take advantage of novel metrics that consider linguistic structures, lexical relationships, and semantics to compare both source and reference translation against the candidate translation. The new metrics are available for several target languages other than English. In the the official WMT14 evaluation, UPC-IPA and UPC-STOUT scored above the average in 7 out of 9 language pairs at the system level and 8 out of 9 at the segment level.

...read moreread less

Proceedings Article•10.3115/V1/W14-3323•

Manawi: Using Multi-Word Expressions and Named Entities to Improve Machine Translation

[...]

Liling Tan¹, Santanu Pal¹•Institutions (1)

Saarland University¹

1 Jun 2014

TL;DR: The Manawi system showed the potential of improving translation quality by incorporating multiple NLP tools within the MT pipeline by introducing a novel filter method based on sentence-alignment features.

...read moreread less

Abstract: We describe the Manawi 1 (mAnEv) system submitted to the 2014 WMT translation shared task. We participated in the English-Hindi (EN-HI) and Hindi-English (HI-EN) language pair and achieved 0.792 for the Translation Error Rate (TER) score 2 for EN-HI, the lowest among the competing systems. Our main innovations are (i) the usage of outputs from NLP tools, viz. billingual multi-word expression extractor and named-entity recognizer to improve SMT quality and (ii) the introduction of a novel filter method based on sentence-alignment features. The Manawi system showed the potential of improving translation quality by incorporating multiple NLP tools within the MT pipeline.

...read moreread less

Proceedings Article•10.3115/V1/W14-3339•

Referential Translation Machines for Predicting Translation Quality

[...]

Ergun Bicici¹, Andy Way¹•Institutions (1)

Dublin City University¹

26 Jun 2014

TL;DR: Ref referential translation machines remove the need to access any SMT system specific information or prior knowledge of the training data or models used when generating the translations and achieve the top performance in WMT13 quality estimation task (QET13).

...read moreread less

Abstract: We use referential translation machines (RTM) for quality estimation of translation outputs. RTMs are a computational model for identifying the translation acts between any two data sets with respect to interpretants selected in the same domain, which are effective when making monolingual and bilingual similarity judgments. RTMs achieve top performance in automatic, accurate, and language independent prediction of sentence-level and word-level statistical machine translation (SMT) quality. RTMs remove the need to access any SMT system specific information or prior knowledge of the training data or models used when generating the translations and achieve the top performance in WMT13 quality estimation task (QET13). We improve our RTM models with the Parallel FDA5 instance selection model, with additional features for predicting the translation performance, and with improved learning models. We develop RTM models for each WMT14 QET (QET14) subtask, obtain improvements over QET13 results, and rank 1st in all of the tasks and subtasks of QET14.

...read moreread less

Proceedings Article•10.3115/V1/W14-3360•

An Empirical Comparison of Features and Tuning for Phrase-based Machine Translation

[...]

Spence Green¹, Daniel Cer¹, Christopher D. Manning¹•Institutions (1)

Stanford University¹

1 Jun 2014

TL;DR: Extended features are introduced, which are more specific than dense features yet more general than lexicalized sparse features, and yield robust BLEU gains for both Arabic-English and Chinese-English relative to a strong feature-rich baseline.

...read moreread less

Abstract: Scalable discriminative training methods are now broadly available for estimating phrase-based, feature-rich translation models. However, the sparse feature sets typically appearing in research evaluations are less attractive than standard dense features such as language and translation model probabilities: they often overfit, do not generalize, or require complex and slow feature extractors. This paper introduces extended features, which are more specific than dense features yet more general than lexicalized sparse features. Large-scale experiments show that extended features yield robust BLEU gains for both Arabic-English (+1.05) and Chinese-English (+0.67) relative to a strong feature-rich baseline. We also specialize the feature set to specific datadomains, identifyanobjectivefunction that is less prone to overfitting, and release fast, scalable, and language-independent tools for implementing the features.

...read moreread less

Proceedings Article•10.3115/V1/W14-3358•

Dynamic Topic Adaptation for SMT using Distributional Profiles

[...]

Eva Hasler¹, Barry Haddow, Philipp Koehn•Institutions (1)

University of Edinburgh¹

1 Jun 2014

TL;DR: It is shown that combining information from both local and global test contexts helps to improve lexical selection and outperforms a baseline system by up to 1.15 BLEU.

...read moreread less

Abstract: Despite its potential to improve lexical selection, most state-of-the-art machine translation systems take only minimal contextual information into account. We capture context with a topic model over distributional profiles built from the context words of each translation unit. Topic distributions are inferred for each translation unit and used to adapt the translation model dynamically to a given test context by measuring their similarity. We show that combining information from both local and global test contexts helps to improve lexical selection and outperforms a baseline system by up to 1.15 BLEU. We test our topic-adapted model on a diverse data set containing documents from three different domains and achieve competitive performance in comparison with two supervised domain-adapted systems.

...read moreread less

Proceedings Article•10.3115/V1/W14-3315•

The CMU Machine Translation Systems at WMT 2014

[...]

Austin Matthews¹, Waleed Ammar¹, Archna Bhatia¹, Weston Feely, Greg Hanneman¹, Eva Schlinger¹, Swabha Swayamdipta¹, Yulia Tsvetkov¹, Alon Lavie¹, Chris Dyer¹ - Show less +6 more•Institutions (1)

Carnegie Mellon University¹

1 Jun 2014

TL;DR: Inventions include: a label coarsening scheme for syntactic tree-to-tree translation, a host of new discriminative features, several modules to create “synthetic translation options” that can generalize beyond what is directly observed in the training data, and a method of combining the output of multiple word aligners to uncover extra phrase pairs and grammar rules.

...read moreread less

Abstract: We describe the CMU systems submitted to the 2014 WMT shared translation task. We participated in two language pairs, German–English and Hindi–English. Our innovations include: a label coarsening scheme for syntactic tree-to-tree translation, a host of new discriminative features, several modules to create “synthetic translation options” that can generalize beyond what is directly observed in the training data, and a method of combining the output of multiple word aligners to uncover extra phrase pairs and grammar rules.

...read moreread less

Proceedings Article•10.3115/V1/W14-3344•

LIMSI Submission for WMT'14 QE Task

[...]

Guillaume Wisniewski¹, Nicolas Pécheux², Alexander Allauzen, François Yvon•Institutions (2)

University of Paris-Sud¹, Centre national de la recherche scientifique²

1 Jun 2014

TL;DR: LIMSI participation to the WMT’14 Shared Task on Quality Estimation; the system relies on a random forest classifier, an ensemble method that has been shown to be very competitive for this kind of task, when only a few dense and continuous features are used.

...read moreread less

Abstract: This paper describes LIMSI participation to the WMT’14 Shared Task on Quality Estimation; we took part to the wordlevel quality estimation task for English to Spanish translations. Our system relies on a random forest classifier, an ensemble method that has been shown to be very competitive for this kind of task, when only a few dense and continuous features are used. Notably, only 16 features are used in our experiments. These features describe, on the one hand, the quality of the association between the source sentence and each target word and, on the other hand, the fluency of the hypothesis. Since the evaluation criterion is the f1 measure, a specific tuning strategy is proposed to select the optimal values for the hyper-parameters. Overall, our system achieves a 0.67 f1 score on a randomly extracted test set.

...read moreread less

Proceedings Article•10.3115/V1/W14-3338•

SHEF-Lite 2.0: Sparse Multi-task Gaussian Processes for Translation Quality Estimation

[...]

Daniel Beck¹, Kashif Shah¹, Lucia Specia¹•Institutions (1)

University of Sheffield¹

1 Jun 2014

TL;DR: These submissions use the framework of Multi-task Gaussian Processes, where they combine multiple datasets in a multi-task setting to speed up training and prediction by providing sensible sparse approximations.

...read moreread less

Abstract: We describe our systems for the WMT14 Shared Task on Quality Estimation (subtasks 1.1, 1.2 and 1.3). Our submissions use the framework of Multi-task Gaussian Processes, where we combine multiple datasets in a multi-task setting. Due to the large size of our datasets we also experiment with Sparse Gaussian Processes, which aim to speed up training and prediction by providing sensible sparse approximations.

...read moreread less

Proceedings Article•10.3115/V1/W14-3317•

The RWTH Aachen German-English Machine Translation System for WMT 2014

[...]

Stephan Peitz¹, Joern Wuebker¹, Markus Freitag¹, Hermann Ney¹•Institutions (1)

RWTH Aachen University¹

1 Jun 2014

TL;DR: This paper describes the statistical machine translation (SMT) systems developed at RWTH Aachen University for the German!English translation task of the ACL 2014 Eighth Workshop on Statistical Machine Translation (WMT 2014).

...read moreread less

Abstract: This paper describes the statistical machine translation (SMT) systems developed at RWTH Aachen University for the German!English translation task of the ACL 2014 Eighth Workshop on Statistical Machine Translation (WMT 2014). Both hierarchical and phrase-based SMT systems are applied employing hierarchical phrase reordering and word class language models. For the phrase-based system, we run discriminative phrase training. In addition, we describe our preprocessing pipeline for German!English.

...read moreread less

Proceedings Article•10.3115/V1/W14-3362•

Augmenting String-to-Tree and Tree-to-String Translation with Non-Syntactic Phrases

[...]

Matthias Huck, Hieu Hoang¹, Philipp Koehn•Institutions (1)

University of Edinburgh¹

1 Jun 2014

TL;DR: An effective technique to easily augment GHKM-style syntax-based machine translation systems (Galley et al., 2006) with phrase pairs that do not comply with any syntactic well-formedness constraints is presented.

...read moreread less

Abstract: We present an effective technique to easily augment GHKM-style syntax-based machine translation systems (Galley et al., 2006) with phrase pairs that do not comply with any syntactic well-formedness constraints. Non-syntactic phrase pairs are distinguished from syntactic ones in order to avoid harming effects. We apply our technique in state-of-the-art string-totree and tree-to-string setups. For tree-tostring translation, we furthermore investigate novel approaches for translating with source-syntax GHKM rules in association with input tree constraints and input tree features.

...read moreread less

Proceedings Article•10.3115/V1/W14-3347•

VERTa participation in the WMT14 Metrics Task

[...]

Elisabet Comelles¹, Jordi Atserias²•Institutions (2)

University of Barcelona¹, Yahoo!²

1 Jun 2014

TL;DR: VERTa, a linguistically-motivated metric that combines linguistic features at different levels, is presented and the linguistic motivation on which the metric is based is provided, as well as the different modules in VERTa and how they are combined.

...read moreread less

Abstract: In this paper we present VERTa, a linguistically-motivated metric that combines linguistic features at different levels. We provide the linguistic motivation on which the metric is based, as well as describe the different modules in VERTa and how they are combined. Finally, we describe the two versions of VERTa, VERTa-EQ and VERTa-W, sent to WMT14 and report results obtained in the experiments conducted with the WMT12 and WMT13 data into English.

...read moreread less

Proceedings Article•10.3115/V1/W14-3321•

Machine Translation and Monolingual Postediting: The AFRL WMT-14 System

[...]

Lane Schwartz¹, Timothy R. Anderson², Jeremy Gwinnup, Katherine Young•Institutions (2)

University of Illinois at Urbana–Champaign¹, Air Force Research Laboratory²

1 Jun 2014

TL;DR: The AFRL statistical MT system and the improvements that were developed during the WMT14 evaluation campaign are described and the efforts to make use of monolingual English speakers to correct the output of machine translation are described.

...read moreread less

Abstract: This paper describes the AFRL statistical MT system and the improvements that were developed during the WMT14 evaluation campaign. As part of these efforts we experimented with a number of extensions to the standard phrase-based model that improve performance on Russian to English and Hindi to English translation tasks. In addition, we describe our efforts to make use of monolingual English speakers to correct the output of machine translation, and present the results of monolingual postediting of the entire 3003 sentences of the WMT14 Russian-English test set.

...read moreread less

Proceedings Article•10.3115/V1/W14-3322•

CUNI in WMT14: Chimera Still Awaits Bellerophon

[...]

Aleš Tamchyna¹, Martin Popel¹, Rudolf Rosa¹, Ondrej Bojar¹•Institutions (1)

Charles University in Prague¹

1 Jun 2014

TL;DR: The English!Czech and English!Hindi submissions for this year’s WMT translation task are presented and reverse self-training to acquire more parallel data and with modeling target-side morphology are experimented with.

...read moreread less

Abstract: We present our English!Czech and English!Hindi submissions for this year’s WMT translation task. For English!Czech, we build upon last year’s CHIMERA and evaluate several setups. English!Hindi is a new language pair for this year. We experimented with reverse self-training to acquire more (synthetic) parallel data and with modeling target-side morphology.

...read moreread less

Proceedings Article•10.3115/V1/W14-3331•

Combining Domain Adaptation Approaches for Medical Text Translation

[...]

Longyue Wang¹, Yi Lu¹, Derek F. Wong¹, Lidia S. Chao¹, Yiming Wang¹, Francisco Oliveira¹ - Show less +2 more•Institutions (1)

University of Macau¹

1 Jun 2014

TL;DR: A number of simple and effective techniques to adapt statistical machine translation systems in the medical domain and these systems achieve the best BLEU scores for Czech-English, EnglishGerman, French-English language pairs and the second best Blemish scores for reminding pairs are explored.

...read moreread less

Abstract: This paper explores a number of simple and effective techniques to adapt statistical machine translation (SMT) systems in the medical domain. Comparative experiments are conducted on large corpora for six language pairs. We not only compare each adapted system with the baseline, but also combine them to further improve the domain-specific systems. Finally, we attend the WMT2014 medical summary sentence translation constrained task and our systems achieve the best BLEU scores for Czech-English, EnglishGerman, French-English language pairs and the second best BLEU scores for reminding pairs.

...read moreread less

Proceedings Article•10.3115/V1/W14-3330•

LIMSI $@$ WMTâ€™14 Medical Translation Task

[...]

Nicolas Pécheux¹, Li Gong¹, Quoc Khanh Do², Benjamin Marie³, Yulia Ivanishcheva, Alexander Allauzen, Thomas Lavergne, Jan Niehues⁴, Aurélien Max¹, François Yvon⁵ - Show less +6 more•Institutions (5)

Centre national de la recherche scientifique¹, Université Paris-Saclay², National Institute of Information and Communications Technology³, Karlsruhe Institute of Technology⁴, University of Paris-Sud⁵

1 Jun 2014

TL;DR: LIMSI’s submission to the first medical translation task at WMT’14 is described and results for EnglishFrench on the subtask of sentence translation from summaries of medical articles are reported.

...read moreread less

Abstract: This paper describes LIMSI’s submission to the first medical translation task at WMT’14. We report results for EnglishFrench on the subtask of sentence translation from summaries of medical articles. Our main submission uses a combination of NCODE (n-gram-based) and MOSES (phrase-based) output and continuous-space language models used in a post-processing step for each system. Other characteristics of our submission include: the use of sampling for building MOSES’ phrase table; the implementation of the vector space model proposed by Chen et al. (2013); adaptation of the POStagger used by NCODE to the medical domain; and a report of error analysis based on the typology of Vilar et al. (2006).

...read moreread less

Proceedings Article•10.3115/V1/W14-3314•

The DCU-ICTCAS MT system at WMT 2014 on German-English Translation Task

[...]

Liangyou Li¹, Xiaofeng Wu¹, Santiago Cortés Va'illo¹, Jun Xie¹, Andy Way¹, Qun Liu² - Show less +2 more•Institutions (2)

Dublin City University¹, Chinese Academy of Sciences²

1 Jun 2014

TL;DR: This paper describes the DCU submission to WMT 2014 on German-English translation task, which uses phrasebased translation model with several popular techniques, including Lexicalized Reordering Model, Operation Sequence Model and Language Model interpolation.

...read moreread less

Abstract: This paper describes the DCU submission to WMT 2014 on German-English translation task. Our system uses phrasebased translation model with several popular techniques, including Lexicalized Reordering Model, Operation Sequence Model and Language Model interpolation. Our final submission is the result of system combination on several systems which have different pre-processing and alignments.

...read moreread less

Proceedings Article•10.3115/V1/W14-3304•

Yandex School of Data Analysis Russian-English Machine Translation System for WMT14

[...]

Alexey Borisov¹, Irina Galinskaya•Institutions (1)

Moscow State University¹

1 Jun 2014

TL;DR: This paper describes the Yandex School of Data Analysis Russian-English system and proposes a {simple yet practical} algorithm to transform Russian sentence into a more easily translatable form before decoding.

...read moreread less

Abstract: This paper describes the Yandex School of Data Analysis Russian-English system submitted to the ACL 2014 Ninth Workshop on Statistical Machine Translation shared translation task. We start with the system that we developed last year and investigate a few methods that were successful at the previous translation task including unpruned language model, operation sequence model and the new reparameterization of IBM Model 2. Next we propose a {simple yet practical} algorithm to transform Russian sentence into a more easily translatable form before decoding. The algorithm is based on the linguistic intuition of native Russian speakers, also fluent in English.

...read moreread less