Scispace (Formerly Typeset)
  1. Home
  2. Conferences
  3. Workshop on Statistical Machine Translation
  4. 2014
  1. Home
  2. Conferences
  3. Workshop on Statistical Machine Translation
  4. 2014
Showing papers presented at "Workshop on Statistical Machine Translation in 2014"
Proceedings Article•10.3115/V1/W14-3348•
Meteor Universal: Language Specific Translation Evaluation for Any Target Language

[...]

Michael Denkowski1, Alon Lavie1•
Carnegie Mellon University1
1 Jun 2014
TL;DR: Meteor Universal brings language specific evaluation to previously unsupported target languages by automatically extracting linguistic resources from the bitext used to train MT systems and using a universal parameter set learned from pooling human judgments of translation quality from several language directions.
Abstract: This paper describes Meteor Universal, released for the 2014 ACL Workshop on Statistical Machine Translation. Meteor Universal brings language specific evaluation to previously unsupported target languages by (1) automatically extracting linguistic resources (paraphrase tables and function word lists) from the bitext used to train MT systems and (2) using a universal parameter set learned from pooling human judgments of translation quality from several language directions. Meteor Universal is shown to significantly outperform baseline BLEU on two new languages, Russian (WMT13) and Hindi (WMT14).

2,582 citations

Proceedings Article•10.3115/V1/W14-3302•
Findings of the 2014 Workshop on Statistical Machine Translation

[...]

Ondrej Bojar1, Christian Buck2, Christian Federmann2, Barry Haddow, Philipp Koehn, Johannes Leveling3, Christof Monz4, Pavel Pecina1, Matt Post5, Herve Saint-Amand2, Radu Soricut6, Lucia Specia7, Aleš Tamchyna1 •
Charles University in Prague1, University of Edinburgh2, Dublin City University3, University of Amsterdam4, Johns Hopkins University5, Google6, University of Sheffield7
1 Jun 2014
TL;DR: The results of the WMT14 shared tasks, which included a standard news translation task, a separate medical translationtask, a task for run-time estimation of machine translation quality, and a metrics task, are presented.
Abstract: This paper presents the results of the WMT14 shared tasks, which included a standard news translation task, a separate medical translation task, a task for run-time estimation of machine translation quality, and a metrics task. This year, 143 machine translation systems from 23 institutions were submitted to the ten translation directions in the standard translation task. An additional 6 anonymized systems were included, and were then evaluated both automatically and manually. The quality estimation task had four subtasks, with a total of 10 teams, submitting 57 entries

927 citations

Proceedings Article•10.3115/V1/W14-3346•
A Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU

[...]

Boxing Chen1, Colin Cherry2•
National Research Council1, University of Alberta2
1 Jun 2014
TL;DR: Three of them are first proposed in this paper and they correlate better with human judgments on the sentence-level than other smoothing techniques, and the performance of using them in statistical machine transla- tion tuning is compared.
Abstract: BLEU is the de facto standard machine translation (MT) evaluation metric. How- ever, because BLEU computes a geo- metric mean of n-gram precisions, it of- ten correlates poorly with human judg- ment on the sentence-level. There- fore, several smoothing techniques have been proposed. This paper systemati- cally compares 7 smoothing techniques for sentence-level BLEU. Three of them are first proposed in this paper, and they correlate better with human judgments on the sentence-level than other smoothing techniques. Moreover, we also compare the performance of using the 7 smoothing techniques in statistical machine transla- tion tuning.

309 citations

Proceedings Article•10.3115/V1/W14-3340•
FBK-UPV-UEdin participation in the WMT14 Quality Estimation shared-task

[...]

José G. C. de Souza1, Jesús González-Rubio2, Christian Buck3, Marco Turchi4, Matteo Negri5 •
Dublin City University1, Polytechnic University of Valencia2, University of Edinburgh3, University of Sheffield4, fondazione bruno kessler5
1 Jun 2014
TL;DR: The joint submission of Fondazione Bruno Kessler, Universitat Politde Val` encia and University of Edinburgh to the Quality Estimation tasks of the Workshop on Statistical Machine Translation 2014 is described.
Abstract: This paper describes the joint submission of Fondazione Bruno Kessler, Universitat Politde Val` encia and University of Edinburgh to the Quality Estimation tasks of the Workshop on Statistical Machine Translation 2014. We present our submis- sions for Task 1.2, 1.3 and 2. Our systems ranked first for Task 1.2 and for the Binary and Level1 settings in Task 2.

56 citations

Proceedings Article•10.3115/V1/W14-3311•
Phrasal: A Toolkit for New Directions in Statistical Machine Translation

[...]

Spence Green1, Daniel Cer1, Christopher D. Manning1•
Stanford University1
1 Jun 2014
TL;DR: A new version of Phrasal, an open-source toolkit for statistical phrasebased machine translation, is presented, which includes features that support emerging research trends such as tuning with large feature sets, and web-based interactive machine translation.
Abstract: We present a new version of Phrasal, an open-source toolkit for statistical phrasebased machine translation. This revision includes features that support emerging research trends such as (a) tuning with large feature sets, (b) tuning on large datasets like thebitext, and(c)web-basedinteractivemachine translation. A direct comparison with Moses shows favorable results in terms of decoding speed and tuning time.

50 citations

Proceedings Article•
Edinburgh’s Phrase-based Machine Translation Systems for WMT-14

[...]

Nadir Durrani1, Barry Haddow1, Philipp Koehn1, Kenneth Heafield1•
University of Edinburgh1
1 Jun 2014
TL;DR: UEDIN’s phrase-based submissions to the translation and medical translation shared tasks of the 2014 Workshop on Statistical Machine Translation (WMT) are described.
Abstract: This paper describes the University of Edinburgh’s (UEDIN) phrase-based submissions to the translation and medical translation shared tasks of the 2014 Workshop on Statistical Machine Translation (WMT). We participated in all language pairs. We have improved upon our 2013 system by i) using generalized representations, specifically automatic word clusters for translations out of English, ii) using unsupervised character-based models to translate unknown words in RussianEnglish and Hindi-English pairs, iii) synthesizing Hindi data from closely-related Urdu data, and iv) building huge language on the common crawl corpus.

41 citations

Proceedings Article•10.3115/V1/W14-3352•
DiscoTK: Using Discourse Structure for Machine Translation Evaluation

[...]

Shafiq Joty1, Francisco Guzmán2, Lluís Màrquez2, Preslav Nakov3•
Qatar Foundation1, Qatar Computing Research Institute2, Cairo University3
1 Jun 2014
TL;DR: Novel automatic metrics for machine translation evaluation that use discourse structure and convolution kernels to compare the discourse tree of an automatic translation with that of the human reference are presented.
Abstract: We present novel automatic metrics for machine translation evaluation that use discourse structure and convolution kernels to compare the discourse tree of an automatic translation with that of the human reference. We experiment with five transformations and augmentations of a base discourse tree representation based on the rhetorical structure theory, and we combine the kernel scores for each of them into a single score. Finally, we add other metrics from the ASIYA MT evaluation toolkit, and we tune the weights of the combination on actual human judgments. Experiments on the WMT12 and WMT13 metrics shared task datasets show correlation with human judgments that outperforms what the best systems that participated in these years achieved, both at the segment and at the system level.

40 citations

Proceedings Article•10.18653/V1/W15-3024•
Edinburgh’s Syntax-Based Systems at WMT 2014

[...]

Philip Williams, Rico Sennrich, Maria Nadejde1, Matthias Huck, Eva Hasler, Philipp Koehn •
University of Edinburgh1
1 Jun 2014
TL;DR: This paper describes the string-to-tree systems built at the University of Edinburgh for the WMT 2014 shared translation task and improved the English-German system through target-side compound splitting, morphosyntactic constraints, and refinements to parse tree annotation.
Abstract: This paper describes the string-to-tree systems built at the University of Edinburgh for the WMT 2014 shared translation task. We developed systems for English-German, Czech-English, FrenchEnglish, German-English, Hindi-English, and Russian-English. This year we improved our English-German system through target-side compound splitting, morphosyntactic constraints, and refinements to parse tree annotation; we addressed the out-of-vocabulary problem using transliteration for Hindi and Russian and using morphological reduction for Russian; we improved our GermanEnglish system through tree binarization; and we reduced system development time by filtering the tuning sets.

34 citations

Proceedings Article•10.3115/V1/W14-3308•
The IIT Bombay Hindi-English Translation System at WMT 2014

[...]

Piyush Dungarwal1, Rajen Chatterjee1, Abhijit Mishra1, Anoop Kunchukuttan1, Ritesh Shah1, Pushpak Bhattacharyya1 •
Indian Institute of Technology Bombay1
1 Jun 2014
TL;DR: It is shown that the use of number, case and Tree Adjoining Grammar information as factors helps to improve English-Hindi translation, primarily by generating morphological inflections correctly.
Abstract: In this paper, we describe our EnglishHindi and Hindi-English statistical systems submitted to the WMT14 shared task. The core components of our translation systems are phrase based (Hindi-English) and factored (English-Hindi) SMT systems. We show that the use of number, case and Tree Adjoining Grammar information as factors helps to improve English-Hindi translation, primarily by generating morphological inflections correctly. We show improvements to the translation systems using pre-procesing and post-processing components. To overcome the structural divergence between English and Hindi, we preorder the source side sentence to conform to the target language word order. Since parallel corpus is limited, many words are not translated. We translate out-of-vocabulary words and transliterate named entities in a post-processing stage. We also investigate ranking of translations from multiple systems to select the best translation.

32 citations

Proceedings Article•10.3115/V1/W14-3342•
LIG System for Word Level QE task at WMT14

[...]

Ngoc Quang Luong, Laurent Besacier, Benjamin Lecouteux1•
University of Grenoble1
1 Jun 2014
TL;DR: The Word-level QE system for WMT 2014 shared task on Spanish-English pair is described, optimized by several ways: tuning the classification threshold, combining with WMT 2013 data, and refining using Feature Selection strategy on the development set, before dealing with the test set for submission.
Abstract: This paper describes our Word-level QE system for WMT 2014 shared task on Spanish-English pair. Compared to WMT 2013, this year's task is different due to the lack of SMT setting information and additional resources. We report how we overcome this challenge to retain most of the important features which performed well last year in our system. Novel features related to the availability of multiple systems output (new point of this year) are also proposed and experimented along with baseline set. The system is optimized by several ways: tuning the classification threshold, combining with WMT 2013 data, and refining using Feature Selection strategy on our development set, before dealing with the test set for submission.

32 citations

Proceedings Article•
Proceedings of the Ninth Workshop on Statistical Machine Translation

[...]

Ondřej Bojar1, Christian Buck2, Christian Federmann2, Barry Haddow2, Philipp Koehn2, Christof Monz3, Matt Post4, Lucia Specia5 •
Charles University in Prague1, University of Edinburgh2, University of Amsterdam3, Johns Hopkins University4, University of Sheffield5
1 Jan 2014
Proceedings Article•10.3115/V1/W14-3310•
EU-BRIDGE MT: Combined Machine Translation

[...]

Markus Freitag1, Stephan Peitz1, Joern Wuebker1, Hermann Ney1, Matthias Huck, Rico Sennrich, Nadir Durrani2, Maria Nadejde2, Philip Williams, Philipp Koehn, Teresa Herrmann3, Eunah Cho3, Alex Waibel4 •
RWTH Aachen University1, University of Edinburgh2, Karlsruhe Institute of Technology3, Facebook4
1 Jun 2014
TL;DR: Three research institutes involved in the EU-BRIDGE project combined their individual machine translation systems and participated with a joint setup in the shared translation task of the evaluation campaign at the ACL 2014 Eighth Workshop on Statistical Machine Translation.
Abstract: This paper describes one of the collaborative efforts within EU-BRIDGE to further advance the state of the art in machine translation between two European language pairs, German→English and English→German. Three research institutes involved in the EU-BRIDGE project combined their individual machine translation systems and participated with a joint setup in the shared translation task of the evaluation campaign at the ACL 2014 Eighth Workshop on Statistical Machine Translation (WMT 2014). We combined up to nine different machine translation engines via system combination. RWTH Aachen University, the University of Edinburgh, and Karlsruhe Institute of Technology developed several individual systems which serve as system combination input. We devoted special attention to building syntax-based systems and combining them with the phrasebased ones. The joint setups yield empirical gains of up to 1.6 points in BLEU and 1.0 points in TER on the WMT newstest2013 test set compared to the best single systems.
Proceedings Article•10.3115/V1/W14-3356•
Crowdsourcing High-Quality Parallel Data Extraction from Twitter

[...]

Wang Ling1, Luís Marujo1, Chris Dyer1, Alan W. Black1, Isabel Trancoso2 •
Carnegie Mellon University1, INESC-ID2
1 Jun 2014
TL;DR: The quality of the crowdsourced corpus is significantly better than existing automatic methods: it obtains an performance comparable to expert annotations when used in MERT tuning of a microblog MT system; and training a parallel sentence classifier with it leads also to improved results.
Abstract: High-quality parallel data is crucial for a range of multilingual applications, from tuning and evaluating machine translation systems to cross-lingual annotation projection. Unfortunately, automatically obtained parallel data (which is available in relative abundance) tends to be quite noisy. To obtain high-quality parallel data, we introduce a crowdsourcing paradigm in which workers with only basic bilingual proficiency identify translations from an automatically extracted corpus of parallel microblog messages. For less than $350, we obtained over 5000 parallel segments in five language pairs. Evaluated against expert annotations, the quality of the crowdsourced corpus is significantly better than existing automatic methods: it obtains an performance comparable to expert annotations when used in MERT tuning of a microblog MT system; and training a parallel sentence classifier with it leads also to improved results. The crowdsourced corpora will be made available in http://www.cs.cmu.edu/ ~lingwang/microtopia/.
Proceedings Article•10.3115/V1/W14-3351•
IPA and STOUT: Leveraging Linguistic and Source-based Features for Machine Translation Evaluation

[...]

Meritxell González1, Alberto Barrón-Cedeño1, Lluís Màrquez2•
Polytechnic University of Catalonia1, Qatar Computing Research Institute2
1 Jun 2014
TL;DR: The two UPC submissions to the WMT14 Metrics Shared Task take advantage of novel metrics that consider linguistic structures, lexical relationships, and semantics to compare both source and reference translation against the candidate translation.
Abstract: This paper describes the UPC submissions to the WMT14 Metrics Shared Task: UPCIPA and UPC-STOUT. These metrics use a collection of evaluation measures integrated in ASIYA, a toolkit for machine translation evaluation. In addition to some standard metrics, the two submissions take advantage of novel metrics that consider linguistic structures, lexical relationships, and semantics to compare both source and reference translation against the candidate translation. The new metrics are available for several target languages other than English. In the the official WMT14 evaluation, UPC-IPA and UPC-STOUT scored above the average in 7 out of 9 language pairs at the system level and 8 out of 9 at the segment level.
Proceedings Article•10.3115/V1/W14-3323•
Manawi: Using Multi-Word Expressions and Named Entities to Improve Machine Translation

[...]

Liling Tan1, Santanu Pal1•
Saarland University1
1 Jun 2014
TL;DR: The Manawi system showed the potential of improving translation quality by incorporating multiple NLP tools within the MT pipeline by introducing a novel filter method based on sentence-alignment features.
Abstract: We describe the Manawi 1 (mAnEv) system submitted to the 2014 WMT translation shared task. We participated in the English-Hindi (EN-HI) and Hindi-English (HI-EN) language pair and achieved 0.792 for the Translation Error Rate (TER) score 2 for EN-HI, the lowest among the competing systems. Our main innovations are (i) the usage of outputs from NLP tools, viz. billingual multi-word expression extractor and named-entity recognizer to improve SMT quality and (ii) the introduction of a novel filter method based on sentence-alignment features. The Manawi system showed the potential of improving translation quality by incorporating multiple NLP tools within the MT pipeline.
Proceedings Article•10.3115/V1/W14-3339•
Referential Translation Machines for Predicting Translation Quality

[...]

Ergun Bicici1, Andy Way1•
Dublin City University1
26 Jun 2014
TL;DR: Ref referential translation machines remove the need to access any SMT system specific information or prior knowledge of the training data or models used when generating the translations and achieve the top performance in WMT13 quality estimation task (QET13).
Abstract: We use referential translation machines (RTM) for quality estimation of translation outputs. RTMs are a computational model for identifying the translation acts between any two data sets with respect to interpretants selected in the same domain, which are effective when making monolingual and bilingual similarity judgments. RTMs achieve top performance in automatic, accurate, and language independent prediction of sentence-level and word-level statistical machine translation (SMT) quality. RTMs remove the need to access any SMT system specific information or prior knowledge of the training data or models used when generating the translations and achieve the top performance in WMT13 quality estimation task (QET13). We improve our RTM models with the Parallel FDA5 instance selection model, with additional features for predicting the translation performance, and with improved learning models. We develop RTM models for each WMT14 QET (QET14) subtask, obtain improvements over QET13 results, and rank 1st in all of the tasks and subtasks of QET14.
Proceedings Article•10.3115/V1/W14-3360•
An Empirical Comparison of Features and Tuning for Phrase-based Machine Translation

[...]

Spence Green1, Daniel Cer1, Christopher D. Manning1•
Stanford University1
1 Jun 2014
TL;DR: Extended features are introduced, which are more specific than dense features yet more general than lexicalized sparse features, and yield robust BLEU gains for both Arabic-English and Chinese-English relative to a strong feature-rich baseline.
Abstract: Scalable discriminative training methods are now broadly available for estimating phrase-based, feature-rich translation models. However, the sparse feature sets typically appearing in research evaluations are less attractive than standard dense features such as language and translation model probabilities: they often overfit, do not generalize, or require complex and slow feature extractors. This paper introduces extended features, which are more specific than dense features yet more general than lexicalized sparse features. Large-scale experiments show that extended features yield robust BLEU gains for both Arabic-English (+1.05) and Chinese-English (+0.67) relative to a strong feature-rich baseline. We also specialize the feature set to specific datadomains, identifyanobjectivefunction that is less prone to overfitting, and release fast, scalable, and language-independent tools for implementing the features.
Proceedings Article•10.3115/V1/W14-3358•
Dynamic Topic Adaptation for SMT using Distributional Profiles

[...]

Eva Hasler1, Barry Haddow, Philipp Koehn•
University of Edinburgh1
1 Jun 2014
TL;DR: It is shown that combining information from both local and global test contexts helps to improve lexical selection and outperforms a baseline system by up to 1.15 BLEU.
Abstract: Despite its potential to improve lexical selection, most state-of-the-art machine translation systems take only minimal contextual information into account. We capture context with a topic model over distributional profiles built from the context words of each translation unit. Topic distributions are inferred for each translation unit and used to adapt the translation model dynamically to a given test context by measuring their similarity. We show that combining information from both local and global test contexts helps to improve lexical selection and outperforms a baseline system by up to 1.15 BLEU. We test our topic-adapted model on a diverse data set containing documents from three different domains and achieve competitive performance in comparison with two supervised domain-adapted systems.
Proceedings Article•10.3115/V1/W14-3315•
The CMU Machine Translation Systems at WMT 2014

[...]

Austin Matthews1, Waleed Ammar1, Archna Bhatia1, Weston Feely, Greg Hanneman1, Eva Schlinger1, Swabha Swayamdipta1, Yulia Tsvetkov1, Alon Lavie1, Chris Dyer1 •
Carnegie Mellon University1
1 Jun 2014
TL;DR: Inventions include: a label coarsening scheme for syntactic tree-to-tree translation, a host of new discriminative features, several modules to create “synthetic translation options” that can generalize beyond what is directly observed in the training data, and a method of combining the output of multiple word aligners to uncover extra phrase pairs and grammar rules.
Abstract: We describe the CMU systems submitted to the 2014 WMT shared translation task. We participated in two language pairs, German–English and Hindi–English. Our innovations include: a label coarsening scheme for syntactic tree-to-tree translation, a host of new discriminative features, several modules to create “synthetic translation options” that can generalize beyond what is directly observed in the training data, and a method of combining the output of multiple word aligners to uncover extra phrase pairs and grammar rules.
Proceedings Article•10.3115/V1/W14-3344•
LIMSI Submission for WMT'14 QE Task

[...]

Guillaume Wisniewski1, Nicolas Pécheux2, Alexander Allauzen, François Yvon•
University of Paris-Sud1, Centre national de la recherche scientifique2
1 Jun 2014
TL;DR: LIMSI participation to the WMT’14 Shared Task on Quality Estimation; the system relies on a random forest classifier, an ensemble method that has been shown to be very competitive for this kind of task, when only a few dense and continuous features are used.
Abstract: This paper describes LIMSI participation to the WMT’14 Shared Task on Quality Estimation; we took part to the wordlevel quality estimation task for English to Spanish translations. Our system relies on a random forest classifier, an ensemble method that has been shown to be very competitive for this kind of task, when only a few dense and continuous features are used. Notably, only 16 features are used in our experiments. These features describe, on the one hand, the quality of the association between the source sentence and each target word and, on the other hand, the fluency of the hypothesis. Since the evaluation criterion is the f1 measure, a specific tuning strategy is proposed to select the optimal values for the hyper-parameters. Overall, our system achieves a 0.67 f1 score on a randomly extracted test set.
Proceedings Article•10.3115/V1/W14-3338•
SHEF-Lite 2.0: Sparse Multi-task Gaussian Processes for Translation Quality Estimation

[...]

Daniel Beck1, Kashif Shah1, Lucia Specia1•
University of Sheffield1
1 Jun 2014
TL;DR: These submissions use the framework of Multi-task Gaussian Processes, where they combine multiple datasets in a multi-task setting to speed up training and prediction by providing sensible sparse approximations.
Abstract: We describe our systems for the WMT14 Shared Task on Quality Estimation (subtasks 1.1, 1.2 and 1.3). Our submissions use the framework of Multi-task Gaussian Processes, where we combine multiple datasets in a multi-task setting. Due to the large size of our datasets we also experiment with Sparse Gaussian Processes, which aim to speed up training and prediction by providing sensible sparse approximations.
Proceedings Article•10.3115/V1/W14-3317•
The RWTH Aachen German-English Machine Translation System for WMT 2014

[...]

Stephan Peitz1, Joern Wuebker1, Markus Freitag1, Hermann Ney1•
RWTH Aachen University1
1 Jun 2014
TL;DR: This paper describes the statistical machine translation (SMT) systems developed at RWTH Aachen University for the German!English translation task of the ACL 2014 Eighth Workshop on Statistical Machine Translation (WMT 2014).
Abstract: This paper describes the statistical machine translation (SMT) systems developed at RWTH Aachen University for the German!English translation task of the ACL 2014 Eighth Workshop on Statistical Machine Translation (WMT 2014). Both hierarchical and phrase-based SMT systems are applied employing hierarchical phrase reordering and word class language models. For the phrase-based system, we run discriminative phrase training. In addition, we describe our preprocessing pipeline for German!English.
Proceedings Article•10.3115/V1/W14-3362•
Augmenting String-to-Tree and Tree-to-String Translation with Non-Syntactic Phrases

[...]

Matthias Huck, Hieu Hoang1, Philipp Koehn•
University of Edinburgh1
1 Jun 2014
TL;DR: An effective technique to easily augment GHKM-style syntax-based machine translation systems (Galley et al., 2006) with phrase pairs that do not comply with any syntactic well-formedness constraints is presented.
Abstract: We present an effective technique to easily augment GHKM-style syntax-based machine translation systems (Galley et al., 2006) with phrase pairs that do not comply with any syntactic well-formedness constraints. Non-syntactic phrase pairs are distinguished from syntactic ones in order to avoid harming effects. We apply our technique in state-of-the-art string-totree and tree-to-string setups. For tree-tostring translation, we furthermore investigate novel approaches for translating with source-syntax GHKM rules in association with input tree constraints and input tree features.
Proceedings Article•10.3115/V1/W14-3347•
VERTa participation in the WMT14 Metrics Task

[...]

Elisabet Comelles1, Jordi Atserias2•
University of Barcelona1, Yahoo!2
1 Jun 2014
TL;DR: VERTa, a linguistically-motivated metric that combines linguistic features at different levels, is presented and the linguistic motivation on which the metric is based is provided, as well as the different modules in VERTa and how they are combined.
Abstract: In this paper we present VERTa, a linguistically-motivated metric that combines linguistic features at different levels. We provide the linguistic motivation on which the metric is based, as well as describe the different modules in VERTa and how they are combined. Finally, we describe the two versions of VERTa, VERTa-EQ and VERTa-W, sent to WMT14 and report results obtained in the experiments conducted with the WMT12 and WMT13 data into English.
Proceedings Article•10.3115/V1/W14-3321•
Machine Translation and Monolingual Postediting: The AFRL WMT-14 System

[...]

Lane Schwartz1, Timothy R. Anderson2, Jeremy Gwinnup, Katherine Young•
University of Illinois at Urbana–Champaign1, Air Force Research Laboratory2
1 Jun 2014
TL;DR: The AFRL statistical MT system and the improvements that were developed during the WMT14 evaluation campaign are described and the efforts to make use of monolingual English speakers to correct the output of machine translation are described.
Abstract: This paper describes the AFRL statistical MT system and the improvements that were developed during the WMT14 evaluation campaign. As part of these efforts we experimented with a number of extensions to the standard phrase-based model that improve performance on Russian to English and Hindi to English translation tasks. In addition, we describe our efforts to make use of monolingual English speakers to correct the output of machine translation, and present the results of monolingual postediting of the entire 3003 sentences of the WMT14 Russian-English test set.
Proceedings Article•10.3115/V1/W14-3322•
CUNI in WMT14: Chimera Still Awaits Bellerophon

[...]

Aleš Tamchyna1, Martin Popel1, Rudolf Rosa1, Ondrej Bojar1•
Charles University in Prague1
1 Jun 2014
TL;DR: The English!Czech and English!Hindi submissions for this year’s WMT translation task are presented and reverse self-training to acquire more parallel data and with modeling target-side morphology are experimented with.
Abstract: We present our English!Czech and English!Hindi submissions for this year’s WMT translation task. For English!Czech, we build upon last year’s CHIMERA and evaluate several setups. English!Hindi is a new language pair for this year. We experimented with reverse self-training to acquire more (synthetic) parallel data and with modeling target-side morphology.
Proceedings Article•10.3115/V1/W14-3331•
Combining Domain Adaptation Approaches for Medical Text Translation

[...]

Longyue Wang1, Yi Lu1, Derek F. Wong1, Lidia S. Chao1, Yiming Wang1, Francisco Oliveira1 •
University of Macau1
1 Jun 2014
TL;DR: A number of simple and effective techniques to adapt statistical machine translation systems in the medical domain and these systems achieve the best BLEU scores for Czech-English, EnglishGerman, French-English language pairs and the second best Blemish scores for reminding pairs are explored.
Abstract: This paper explores a number of simple and effective techniques to adapt statistical machine translation (SMT) systems in the medical domain. Comparative experiments are conducted on large corpora for six language pairs. We not only compare each adapted system with the baseline, but also combine them to further improve the domain-specific systems. Finally, we attend the WMT2014 medical summary sentence translation constrained task and our systems achieve the best BLEU scores for Czech-English, EnglishGerman, French-English language pairs and the second best BLEU scores for reminding pairs.
Proceedings Article•10.3115/V1/W14-3330•
LIMSI $@$ WMT’14 Medical Translation Task

[...]

Nicolas Pécheux1, Li Gong1, Quoc Khanh Do2, Benjamin Marie3, Yulia Ivanishcheva, Alexander Allauzen, Thomas Lavergne, Jan Niehues4, Aurélien Max1, François Yvon5 •
Centre national de la recherche scientifique1, Université Paris-Saclay2, National Institute of Information and Communications Technology3, Karlsruhe Institute of Technology4, University of Paris-Sud5
1 Jun 2014
TL;DR: LIMSI’s submission to the first medical translation task at WMT’14 is described and results for EnglishFrench on the subtask of sentence translation from summaries of medical articles are reported.
Abstract: This paper describes LIMSI’s submission to the first medical translation task at WMT’14. We report results for EnglishFrench on the subtask of sentence translation from summaries of medical articles. Our main submission uses a combination of NCODE (n-gram-based) and MOSES (phrase-based) output and continuous-space language models used in a post-processing step for each system. Other characteristics of our submission include: the use of sampling for building MOSES’ phrase table; the implementation of the vector space model proposed by Chen et al. (2013); adaptation of the POStagger used by NCODE to the medical domain; and a report of error analysis based on the typology of Vilar et al. (2006).
Proceedings Article•10.3115/V1/W14-3314•
The DCU-ICTCAS MT system at WMT 2014 on German-English Translation Task

[...]

Liangyou Li1, Xiaofeng Wu1, Santiago Cortés Va'illo1, Jun Xie1, Andy Way1, Qun Liu2 •
Dublin City University1, Chinese Academy of Sciences2
1 Jun 2014
TL;DR: This paper describes the DCU submission to WMT 2014 on German-English translation task, which uses phrasebased translation model with several popular techniques, including Lexicalized Reordering Model, Operation Sequence Model and Language Model interpolation.
Abstract: This paper describes the DCU submission to WMT 2014 on German-English translation task. Our system uses phrasebased translation model with several popular techniques, including Lexicalized Reordering Model, Operation Sequence Model and Language Model interpolation. Our final submission is the result of system combination on several systems which have different pre-processing and alignments.
Proceedings Article•10.3115/V1/W14-3304•
Yandex School of Data Analysis Russian-English Machine Translation System for WMT14

[...]

Alexey Borisov1, Irina Galinskaya•
Moscow State University1
1 Jun 2014
TL;DR: This paper describes the Yandex School of Data Analysis Russian-English system and proposes a {simple yet practical} algorithm to transform Russian sentence into a more easily translatable form before decoding.
Abstract: This paper describes the Yandex School of Data Analysis Russian-English system submitted to the ACL 2014 Ninth Workshop on Statistical Machine Translation shared translation task. We start with the system that we developed last year and investigate a few methods that were successful at the previous translation task including unpruned language model, operation sequence model and the new reparameterization of IBM Model 2. Next we propose a {simple yet practical} algorithm to transform Russian sentence into a more easily translatable form before decoding. The algorithm is based on the linguistic intuition of native Russian speakers, also fluent in English.

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve