Top 40 papers presented at Workshop on Statistical Machine Translation in 2007

Showing papers presented at "Workshop on Statistical Machine Translation in 2007"

Proceedings Article•10.3115/1626355.1626389•

METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments

[...]

Alon Lavie¹, Abhaya Agarwal¹•Institutions (1)

23 Jun 2007

TL;DR: The technical details underlying the Meteor metric are recapped, the latest release includes improved metric parameters and extends the metric to support evaluation of MT output in Spanish, French and German, in addition to English.

...read moreread less

Abstract: Meteor is an automatic metric for Machine Translation evaluation which has been demonstrated to have high levels of correlation with human judgments of translation quality, significantly outperforming the more commonly used Bleu metric. It is one of several automatic metrics used in this year's shared task within the ACL WMT-07 workshop. This paper recaps the technical details underlying the metric and describes recent improvements in the metric. The latest release includes improved metric parameters and extends the metric to support evaluation of MT output in Spanish, French and German, in addition to English.

...read moreread less

1,384 citations

Proceedings Article•10.3115/1626355.1626387•

Statistical Post-Editing on SYSTRAN's Rule-Based Translation System

[...]

Lo"ic Dugast, Jean Senellart, Philipp Koehn¹•Institutions (1)

University of Edinburgh¹

23 Jun 2007

TL;DR: This article describes the combination of a SYSTRAN system with a "statistical post-editing" (SPE) system and documents qualitative analysis on two experiments performed in the shared task of the ACL 2007 Workshop on Statistical Machine Translation.

...read moreread less

Abstract: This article describes the combination of a SYSTRAN system with a "statistical post-editing" (SPE) system. We document qualitative analysis on two experiments performed in the shared task of the ACL 2007 Workshop on Statistical Machine Translation. Comparative results and more integrated "hybrid" techniques are discussed.

...read moreread less

150 citations

Proceedings Article•10.3115/1626355.1626393•

Linguistic Features for Automatic Evaluation of Heterogenous MT Systems

[...]

Jesús Giménez¹, Lluís Màrquez¹•Institutions (1)

Polytechnic University of Catalonia¹

23 Jun 2007

TL;DR: Experimental results are provided showing that metrics based on deeper linguistic information (syntactic/shallow-semantic) are able to produce more reliable system rankings than metricsbased on lexical matching alone, specially when the systems under evaluation are of a different nature.

...read moreread less

Abstract: Evaluation results recently reported by Callison-Burch et al. (2006) and Koehn and Monz (2006), revealed that, in certain cases, the BLEU metric may not be a reliable MT quality indicator. This happens, for instance, when the systems under evaluation are based on different paradigms, and therefore, do not share the same lexicon. The reason is that, while MT quality aspects are diverse, BLEU limits its scope to the lexical dimension. In this work, we suggest using metrics which take into account linguistic features at more abstract levels. We provide experimental results showing that metrics based on deeper linguistic information (syntactic/shallow-semantic) are able to produce more reliable system rankings than metrics based on lexical matching alone, specially when the systems under evaluation are of a different nature.

...read moreread less

135 citations

Proceedings Article•10.3115/1626355.1626366•

Using Word-Dependent Transition Models in HMM-Based Word Alignment for Statistical Machine Translation

[...]

Xiaodong He¹•Institutions (1)

Microsoft¹

23 Jun 2007

TL;DR: A Bayesian Learning based method to train word dependent transition models for HMM based word alignment gives consistent and significant alignment error rate (AER) reduction and machine translation results show that word alignment can be used in a phrase-based machine translation system.

...read moreread less

Abstract: In this paper, we present a Bayesian Learning based method to train word dependent transition models for HMM based word alignment. We present word alignment results on the Canadian Hansards corpus as compared to the conventional HMM and IBM model 4. We show that this method gives consistent and significant alignment error rate (AER) reduction. We also conducted machine translation (MT) experiments on the Europarl corpus. MT results show that word alignment based on this method can be used in a phrase-based machine translation system to yield up to 1% absolute improvement in BLEU score, compared to a conventional HMM, and 0.8% compared to a IBM model 4 based word alignment.

...read moreread less

105 citations

Proceedings Article•10.3115/1626355.1626359•

Exploring Different Representational Units in English-to-Turkish Statistical Machine Translation

[...]

Kemal Oflazer¹, Ilknur Durgar El-Kahlout•Institutions (1)

Carnegie Mellon University¹

23 Jun 2007

TL;DR: This work investigates different representational granularities for sub-lexical representation in statistical machine translation work from English to Turkish and finds that representing both Turkish and English at the morpheme-level but with some selective morpheMe-grouping on the Turkish side of the training data provides a non-trivial improvement over a fully word-based baseline.

...read moreread less

Abstract: We investigate different representational granularities for sub-lexical representation in statistical machine translation work from English to Turkish. We find that (i) representing both Turkish and English at the morpheme-level but with some selective morpheme-grouping on the Turkish side of the training data, (ii) augmenting the training data with "sentences" comprising only the content words of the original training data to bias root word alignment, (iii) reranking the n-best morpheme-sequence outputs of the decoder with a word-based language model, and (iv) using model iteration all provide a non-trivial improvement over a fully word-based baseline. Despite our very limited training data, we improve from 20.22 BLEU points for our simplest model to 25.08 BLEU points for an improvement of 4.86 points or 24% relative.

...read moreread less

105 citations

Proceedings Article•10.3115/1626355.1626377•

Domain Adaptation in Statistical Machine Translation with Mixture Modelling

[...]

Jorge Civera¹, Alfons Juan¹•Institutions (1)

Polytechnic University of Valencia¹

23 Jun 2007

TL;DR: This paper describes a mixture extension of the HMM alignment model and the derivation of Viterbi alignments to feed a state-of-the-art phrase-based system in statistical machine translation.

...read moreread less

Abstract: Mixture modelling is a standard technique for density estimation, but its use in statistical machine translation (SMT) has just started to be explored. One of the main advantages of this technique is its capability to learn specific probability distributions that better fit subsets of the training dataset. This feature is even more important in SMT given the difficulties to translate polysemic terms whose semantic depends on the context in which that term appears. In this paper, we describe a mixture extension of the HMM alignment model and the derivation of Viterbi alignments to feed a state-of-the-art phrase-based system. Experiments carried out on the Europarl and News Commentary corpora show the potential interest and limitations of mixture modelling.

...read moreread less

90 citations

Proceedings Article•10.3115/1626355.1626369•

Labelled Dependencies in Machine Translation Evaluation

[...]

Karolina Owczarzak¹, Josef van Genabith¹, Andy Way¹•Institutions (1)

Dublin City University¹

23 Jun 2007

TL;DR: A dependency-based method for evaluating the quality of Machine Translation output, using labelled dependencies produced by a Lexical-Functional Grammar (LFG) parser, which reaches high correlation with human scores.

...read moreread less

Abstract: We present a method for evaluating the quality of Machine Translation (MT) output, using labelled dependencies produced by a Lexical-Functional Grammar (LFG) parser. Our dependency-based method, in contrast to most popular string-based evaluation metrics, does not unfairly penalize perfectly valid syntactic variations in the translation, and the addition of WordNet provides a way to accommodate lexical variation. In comparison with other metrics on 16,800 sentences of Chinese-English newswire text, our method reaches high correlation with human scores.

...read moreread less

64 citations

Proceedings Article•10.3115/1626355.1626390•

English-to-Czech Factored Machine Translation

[...]

Ondřej Bojar

23 Jun 2007

TL;DR: Experimental results demonstrate significant improvement of translation quality in terms of BLEU.

...read moreread less

Abstract: This paper describes experiments with English-to-Czech phrase-based machine translation. Additional annotation of input and output tokens (multiple factors) is used to explicitly model morphology. We vary the translation scenario (the setup of multiple factors) and the amount of information in the morphological tags. Experimental results demonstrate significant improvement of translation quality in terms of BLEU.

...read moreread less

52 citations

Proceedings Article•10.3115/1626355.1626374•

Context-aware Discriminative Phrase Selection for Statistical Machine Translation

[...]

Jesús Giménez¹, Lluís Màrquez¹•Institutions (1)

Polytechnic University of Catalonia¹

23 Jun 2007

TL;DR: Inspired by common techniques used in Word Sense Disambiguation, classifiers based on local context to predict possible phrase translations are trained and a significant improvement is obtained in adequacy.

...read moreread less

Abstract: In this work we revise the application of discriminative learning to the problem of phrase selection in Statistical Machine Translation. Inspired by common techniques used in Word Sense Disambiguation, we train classifiers based on local context to predict possible phrase translations. Our work extends that of Vickrey et al. (2005) in two main aspects. First, we move from word translation to phrase translation. Second, we move from the 'blank-filling' task to the 'full translation' task. We report results on a set of highly frequent source phrases, obtaining a significant improvement, specially with respect to adequacy, according to a rigorous process of manual evaluation.

...read moreread less

49 citations

Proceedings Article•10.3115/1626355.1626368•

Human Evaluation of Machine Translation Through Binary System Comparisons

[...]

David Vilar¹, Gregor Leusch¹, Hermann Ney¹, Rafael E. Banchs²•Institutions (2)

RWTH Aachen University¹, Polytechnic University of Catalonia²

23 Jun 2007

TL;DR: It is shown how confidence ranges for state-of-the-art evaluation measures such as WER and TER can be computed accurately and efficiently without having to resort to Monte Carlo estimates.

...read moreread less

Abstract: We introduce a novel evaluation scheme for the human evaluation of different machine translation systems. Our method is based on direct comparison of two sentences at a time by human judges. These binary judgments are then used to decide between all possible rankings of the systems. The advantages of this new method are the lower dependency on extensive evaluation guidelines, and a tighter focus on a typical evaluation task, namely the ranking of systems. Furthermore we argue that machine translation evaluations should be regarded as statistical processes, both for human and automatic evaluation. We show how confidence ranges for state-of-the-art evaluation measures such as WER and TER can be computed accurately and efficiently without having to resort to Monte Carlo estimates. We give an example of our new evaluation scheme, as well as a comparison with classical automatic and human evaluation on data from a recent international evaluation campaign.

...read moreread less

40 citations

Proceedings Article•10.3115/1626355.1626391•

Sentence level machine translation evaluation as a ranking problem: one step aside from BLEU

[...]

Yang Ye¹, Ming Zhou², Chin-Yew Lin²•Institutions (2)

University of Michigan¹, Microsoft²

23 Jun 2007

TL;DR: This paper proposed formulating MT evaluation as a ranking problem, as is often done in the practice of assessment by human, and investigated the relative utility of several features under the ranking scenario.

...read moreread less

Abstract: The paper proposes formulating MT evaluation as a ranking problem, as is often done in the practice of assessment by human. Under the ranking scenario, the study also investigates the relative utility of several features. The results show greater correlation with human assessment at the sentence level, even when using an n-gram match score as a baseline feature. The feature contributing the most to the rank order correlation between automatic ranking and human assessment was the dependency structure relation rather than BLEU score and reference language model feature.

...read moreread less

Proceedings Article•10.3115/1626355.1626392•

Localization of Difficult-to-Translate Phrases

[...]

Behrang Mohit¹, Rebecca Hwa¹•Institutions (1)

University of Pittsburgh¹

23 Jun 2007

TL;DR: It is verified that by isolating difficult-to-translate phrases and processing them as special cases, their negative impact on the translation of the rest of the sentences can be reduced.

...read moreread less

Abstract: This paper studies the impact that difficult-to-translate source-language phrases might have on the machine translation process. We formulate the notion of difficulty as a measurable quantity; we show that a classifier can be trained to predict whether a phrase might be difficult to translate; and we develop a framework that makes use of the classifier and external resources (such as human translators) to improve the overall translation quality. Through experimental work, we verify that by isolating difficult-to-translate phrases and processing them as special cases, their negative impact on the translation of the rest of the sentences can be reduced.

...read moreread less

Proceedings Article•

Multi-Engine Machine Translation with an Open-Source SMT Decoder

[...]

Yu Chen, Andreas Eisele, Christian Federmann, Eva Hasler, Michael Jellinghaus, Silke Theison - Show less +2 more

1 Jun 2007

TL;DR: This work uses a variant of standard SMT technology to align translations from one or more RBMT systems with the source text and incorporates phrases extracted from these alignments into the phrase table of the SMT system.

...read moreread less

Abstract: We describe an architecture that allows to combine statistical machine translation (SMT) with rule-based machine translation (RBMT) in a multi-engine setup. We use a variant of standard SMT technology to align translations from one or more RBMT systems with the source text. We incorporate phrases extracted from these alignments into the phrase table of the SMT system and use the open-source decoder Moses to find good combinations of phrases from SMT training data with the phrases derived from RBMT. First experiments based on this hybrid architecture achieve promising results.

...read moreread less

Proceedings Article•10.3115/1626355.1626370•

An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation

[...]

Robert C. Moore¹, Chris Quirk¹•Institutions (1)

Microsoft¹

23 Jun 2007

TL;DR: A new iteratively-trained phrase translation model is proposed that produces translations of quality equal to or better than those produced by Koehn, et al.'s model, and translation quality degrades much more slowly as pruning is tightend to reduce translation time.

...read moreread less

Abstract: Attempts to estimate phrase translation probablities for statistical machine translation using iteratively-trained models have repeatedly failed to produce translations as good as those obtained by estimating phrase translation probablities from surface statistics of bilingual word alignments as described by Koehn, et al. (2003). We propose a new iteratively-trained phrase translation model that produces translations of quality equal to or better than those produced by Koehn, et al.'s model. Moreover, with the new model, translation quality degrades much more slowly as pruning is tightend to reduce translation time.

...read moreread less

Proceedings Article•10.3115/1626355.1626381•

Multi-engine machine translation with an open-source decoder for statistical machine translation

[...]

Yu Chen¹, Andreas Eisele¹, Christian Federmann, Eva Hasler², Michael Jellinghaus¹, Silke Theison¹ - Show less +2 more•Institutions (2)

Saarland University¹, University of Cologne²

23 Jun 2007

...read moreread less

Proceedings Article•10.3115/1626355.1626378•

Getting to Know Moses: Initial Experiments on German-English Factored Translation

[...]

Maria Holmqvist¹, Sara Stymne¹, Lars Ahrenberg¹•Institutions (1)

Linköping University¹

23 Jun 2007

TL;DR: The paper is based on the idea of using an off-the-shelf parser to supply linguistic information to a factored translation model and compares the results of German---English translation to the shared task baseline system based on word form.

...read moreread less

Abstract: We present results and experiences from our experiments with phrase-based statistical machine translation using Moses. The paper is based on the idea of using an off-the-shelf parser to supply linguistic information to a factored translation model and compare the results of German---English translation to the shared task baseline system based on word form. We report partial results for this model and results for two simplified setups. Our best setup takes advantage of the parser's lemmatization and decompounding. A qualitative analysis of compound translation shows that decompounding improves translation quality.

...read moreread less

Proceedings Article•10.3115/1626355.1626375•

Ngram-Based Statistical Machine Translation Enhanced with Multiple Weighted Reordering Hypotheses

[...]

Marta R. Costa-jussià¹, Josep Maria Crego¹, Patrik Lambert¹, Maxim Khalilov¹, José A. R. Fonollosa¹, José B. Mariño¹, Rafael E. Banchs¹ - Show less +3 more•Institutions (1)

Polytechnic University of Catalonia¹

23 Jun 2007

TL;DR: This 2007 Ngram-based statistical machine translation system developed at the TALP Research Center of the UPC (Universitat Politecnica de Catalunya) in Barcelona introduces a target language model based on statistical classes, a feature for out-of-domain units and an improved optimization procedure.

...read moreread less

Abstract: This paper describes the 2007 Ngram-based statistical machine translation system developed at the TALP Research Center of the UPC (Universitat Politecnica de Catalunya) in Barcelona. Emphasis is put on improvements and extensions of the previous years system, being highlyghted and empirically compared. Mainly, these include a novel word ordering strategy based on: (1) statistically monotonizing the training source corpus and (2) a novel reordering approach based on weighted reordering graphs. In addition, this system introduces a target language model based on statistical classes, a feature for out-of-domain units and an improved optimization procedure. The paper provides details of this system participation in the ACL 2007 SECOND WORKSHOP ON STATISTICAL MACHINE TRANSLATION. Results on three pairs of languages are reported, namely from Spanish, French and German into English (and the other way round) for both the in-domain and out-of-domain tasks.

...read moreread less

Proceedings Article•10.3115/1626355.1626382•

The ISL Phrase-Based MT System for the 2007 ACL Workshop on Statistical Machine Translation

[...]

Matthias Paulik¹, Kay Rottmann², Jan Niehues², Silja Hildebrand¹, Stephan Vogel¹ - Show less +1 more•Institutions (2)

Carnegie Mellon University¹, Karlsruhe Institute of Technology²

23 Jun 2007

TL;DR: This paper presents results for a system combination of the ISL syntax-augmented MT system and theISL phrase-based system by combining and rescoring the n-best lists of the two systems.

...read moreread less

Abstract: In this paper we describe the Interactive Systems Laboratories (ISL) phrase-based machine translation system used in the shared task "Machine Translation for European Languages" of the ACL 2007 Workshop on Statistical Machine Translation. We present results for a system combination of the ISL syntax-augmented MT system and the ISL phrase-based system by combining and rescoring the n-best lists of the two systems. We also investigate the combination of two of our phrase-based systems translating from different source languages, namely Spanish and German, into their common target language, English.

...read moreread less

Proceedings Article•10.3115/1626355.1626358•

Integration of an Arabic Transliteration Module into a Statistical Machine Translation System

[...]

Mehdi M. Kashani¹, Eric Joanis, Roland Kuhn, George Foster, Fred Popowich¹ - Show less +1 more•Institutions (1)

Simon Fraser University¹

23 Jun 2007

TL;DR: The experiments show that a transliteration module can help significantly in the situation where the test data is rich with previously unseen named entities and the improvement that can be attributed to the integration using the BLEU metric is evaluated.

...read moreread less

Abstract: We provide an in-depth analysis of the integration of an Arabic-to-English transliteration system into a general-purpose phrase-based statistical machine translation system. We study the integration from different aspects and evaluate the improvement that can be attributed to the integration using the BLEU metric. Our experiments show that a transliteration module can help significantly in the situation where the test data is rich with previously unseen named entities. We obtain 70% and 53% of the theoretical maximum improvement we could achieve, as measured by an oracle on development and test sets respectively for OOV words (out of vocabulary source words not appearing in the phrase table).

...read moreread less

Proceedings Article•10.3115/1626355.1626365•

Training Non-Parametric Features for Statistical Machine Translation

[...]

Patrick Nguyen¹, Milind Mahajan¹, Xiaodong He¹•Institutions (1)

Microsoft¹

23 Jun 2007

TL;DR: This paper proposes to relax the linearity constraints on the combination, and hence relaxing constraints of monotonicity and independence of feature functions, and expands features into a non-parametric, non-linear, and high-dimensional space.

...read moreread less

Abstract: Modern statistical machine translation systems may be seen as using two components: feature extraction, that summarizes information about the translation, and a log-linear framework to combine features. In this paper, we propose to relax the linearity constraints on the combination, and hence relaxing constraints of monotonicity and independence of feature functions. We expand features into a non-parametric, non-linear, and high-dimensional space. We extend empirical Bayes reward training of model parameters to meta parameters of feature generation. In effect, this allows us to trade away some human expert feature design for data. Preliminary results on a standard task show an encouraging improvement.

...read moreread less

Proceedings Article•10.3115/1626355.1626376•

Analysis of Statistical and Morphological Classes to Generate Weigthed Reordering Hypotheses on a Statistical Machine Translation System

[...]

Marta R. Costa-jussià¹, José A. R. Fonollosa¹•Institutions (1)

Polytechnic University of Catalonia¹

23 Jun 2007

TL;DR: This paper experiments with different graph pruning which guarantees the translation quality improvement due to reordering at a very low increase of computational cost.

...read moreread less

Abstract: One main challenge of statistical machine translation (SMT) is dealing with word order. The main idea of the statistical machine reordering (SMR) approach is to use the powerful techniques of SMT systems to generate a weighted reordering graph for SMT systems. This technique supplies reordering constraints to an SMT system, using statistical criteria. In this paper, we experiment with different graph pruning which guarantees the translation quality improvement due to reordering at a very low increase of computational cost. The SMR approach is capable of generalizing reorderings, which have been learned during training, by using word classes instead of words themselves. We experiment with statistical and morphological classes in order to choose those which capture the most probable reorderings. Satisfactory results are reported in the WMT07 Es/En task. Our system outperforms in terms of BLEU the WMT07 Official baseline system.

...read moreread less

Proceedings Article•10.3115/1626355.1626385•

UCB System Description for the WMT 2007 Shared Task

[...]

Preslav Nakov¹, Marti A. Hearst¹•Institutions (1)

University of California, Berkeley¹

23 Jun 2007

TL;DR: For the WMT 2007 shared task, the UC Berkeley team employed monolingual syntactic paraphrases to provide syntactic variety to the source training set sentences and made use of results from prior research that shows that cognate pairs can improve word alignments.

...read moreread less

Abstract: For the WMT 2007 shared task, the UC Berkeley team employed three techniques of interest. First, we used monolingual syntactic paraphrases to provide syntactic variety to the source training set sentences. Second, we trained two language models: a small in-domain model and a large out-of-domain model. Finally, we made use of results from prior research that shows that cognate pairs can improve word alignments. We contributed runs translating English to Spanish, French, and German using various combinations of these techniques.

...read moreread less

Proceedings Article•10.3115/1626355.1626380•

Building a Statistical Machine Translation System for French Using the Europarl Corpus

[...]

Holger Schwenk¹•Institutions (1)

Centre national de la recherche scientifique¹

23 Jun 2007

TL;DR: This paper describes the development of a statistical machine translation system based on the Moses decoder for the 2007 WMT shared tasks and uses a statistical language model that is based on a continuous representation of the words in the vocabulary.

...read moreread less

Abstract: This paper describes the development of a statistical machine translation system based on the Moses decoder for the 2007 WMT shared tasks. Several different translation strategies were explored. We also use a statistical language model that is based on a continuous representation of the words in the vocabulary. By these means we expect to take better advantage of the limited amount of training data. Finally, we have investigated the usefulness of a second reference translation of the development data.

...read moreread less

Proceedings Article•10.3115/1626355.1626386•

The Syntax Augmented MT (SAMT) System at the Shared Task for the 2007 ACL Workshop on Statistical Machine Translation

[...]

Andreas Zollmann¹, Ashish Venugopal¹, Matthias Paulik¹, Stephan Vogel¹•Institutions (1)

Carnegie Mellon University¹

23 Jun 2007

TL;DR: Parameters for components in the open-source SAMT toolkit that were used to generate translation results for the Spanish to English in-domain track of the shared task are described and relative performance is discussed against the authors' phrase-based submission.

...read moreread less

Abstract: We describe the CMU-UKA Syntax Augmented Machine Translation system 'SAMT' used for the shared task "Machine Translation for European Languages" at the ACL 2007 Workshop on Statistical Machine Translation. Following an overview of syntax augmented machine translation, we describe parameters for components in our open-source SAMT toolkit that were used to generate translation results for the Spanish to English in-domain track of the shared task and discuss relative performance against our phrase-based submission.

...read moreread less

Proceedings Article•10.3115/1626355.1626364•

Meta-Structure Transformation Model for Statistical Machine Translation

[...]

Jiadong Sun¹, Tiejun Zhao¹, Huashen Liang¹•Institutions (1)

Harbin Institute of Technology¹

23 Jun 2007

TL;DR: A novel syntax-based model for statistical machine translation in which meta-structure (MS) and meta-Structure sequence (SMS) of a parse tree are defined and significantly outperforms Pharaoh, a state-art-the-art phrase-based system.

...read moreread less

Abstract: We propose a novel syntax-based model for statistical machine translation in which meta-structure (MS) and meta-structure sequence (SMS) of a parse tree are defined. In this framework, a parse tree is decomposed into SMS to deal with the structure divergence and the alignment can be reconstructed at different levels of recombination of MS (RM). RM pairs extracted can perform the mapping between the sub-structures across languages. As a result, we have got not only the translation for the target language, but an SMS of its parse tree at the same time. Experiments with BLEU metric show that the model significantly outperforms Pharaoh, a state-art-the-art phrase-based system.

...read moreread less

Proceedings Article•10.3115/1626355.1626363•

Speech-Input Multi-Target Machine Translation

[...]

Alicia Pérez¹, M. Teresa González², M. Inés Torres¹, Francisco Casacuberta²•Institutions (2)

University of the Basque Country¹, University of Valencia²

23 Jun 2007

TL;DR: The multi-target model has been evaluated in a practical situation, and the results have been compared with those obtained using several mono-target models, showing that the multi- target one requires less amount of memory.

...read moreread less

Abstract: In order to simultaneously translate speech into multiple languages an extension of stochastic finite-state transducers is proposed. In this approach the speech translation model consists of a single network where acoustic models (in the input) and the multilingual model (in the output) are embedded. The multi-target model has been evaluated in a practical situation, and the results have been compared with those obtained using several mono-target models. Experimental results show that the multi-target one requires less amount of memory. In addition, a single decoding is enough to get the speech translated into multiple languages.

...read moreread less

Proceedings Article•10.3115/1626355.1626372•

Mixture-Model Adaptation for SMT

[...]

George Foster¹, Roland Kuhn¹•Institutions (1)

National Research Council¹

23 Jun 2007

TL;DR: A mixture-model approach to adapting a Statistical Machine Translation System for new domains, using weights that depend on text distances to mixture components to achieve gains of approximately one BLEU percentage point over a state-of-the art non-adapted baseline system is described.

...read moreread less

Abstract: We describe a mixture-model approach to adapting a Statistical Machine Translation System for new domains, using weights that depend on text distances to mixture components. We investigate a number of variants on this approach, including cross-domain versus dynamic adaptation; linear versus loglinear mixtures; language and translation model adaptation; different methods of assigning weights; and granularity of the source unit being adapted to. The best methods achieve gains of approximately one BLEU percentage point over a state-of-the art non-adapted baseline system.

...read moreread less

Proceedings Article•10.3115/1626355.1626360•

Can We Translate Letters

[...]

David Vilar¹, Jan-Thorsten Peter¹, Hermann Ney¹•Institutions (1)

RWTH Aachen University¹

23 Jun 2007

TL;DR: This work tries to find out if a nearly unmodified state-of-the-art translation system is able to cope with the problem and whether it is capable to further generalize translation rules, for example at the level of word suffixes and translation of unseen words.

...read moreread less

Abstract: Current statistical machine translation systems handle the translation process as the transformation of a string of symbols into another string of symbols. Normally the symbols dealt with are the words in different languages, sometimes with some additional information included, like morphological data. In this work we try to push the approach to the limit, working not on the level of words, but treating both the source and target sentences as a string of letters. We try to find out if a nearly unmodified state-of-the-art translation system is able to cope with the problem and whether it is capable to further generalize translation rules, for example at the level of word suffixes and translation of unseen words. Experiments are carried out for the translation of Catalan to Spanish.

...read moreread less

Proceedings Article•10.3115/1626355.1626356•

Using Dependency Order Templates to Improve Generality in Translation

[...]

Arul Menezes¹, Chris Quirk¹•Institutions (1)

Microsoft¹

23 Jun 2007

TL;DR: A new reordering model based on dependency order templates is introduced, and it is shown that it outperforms both phrasal and treelet systems on in-domain and out-of-domain text, while limiting the search space.

...read moreread less

Abstract: Today's statistical machine translation systems generalize poorly to new domains. Even small shifts can cause precipitous drops in translation quality. Phrasal systems rely heavily, for both reordering and contextual translation, on long phrases that simply fail to match out-of-domain text. Hierarchical systems attempt to generalize these phrases but their learned rules are subject to severe constraints. Syntactic systems can learn lexicalized and unlexicalized rules, but the joint modeling of lexical choice and reordering can narrow the applicability of learned rules. The treelet approach models reordering separately from lexical choice, using a discriminatively trained order model, which allows treelets to apply broadly, and has shown better generalization to new domains, but suffers a factorially large search space. We introduce a new reordering model based on dependency order templates, and show that it outperforms both phrasal and treelet systems on in-domain and out-of-domain text, while limiting the search space.

...read moreread less

Proceedings Article•10.3115/1626355.1626384•

The "Noisier Channel": Translation from Morphologically Complex Languages

[...]

Chris Dyer¹•Institutions (1)

University of Maryland, College Park¹

23 Jun 2007

TL;DR: A new paradigm for translation from inflectionally rich languages that was used in the University of Maryland statistical machine translation system for the WMT07 Shared Task is presented, and significant gains can be attained when translating from Czech, a language with considerable inflectional complexity, into English.

...read moreread less

Abstract: This paper presents a new paradigm for translation from inflectionally rich languages that was used in the University of Maryland statistical machine translation system for the WMT07 Shared Task. The system is based on a hierarchical phrase-based decoder that has been augmented to translate ambiguous input given in the form of a confusion network (CN), a weighted finite state representation of a set of strings. By treating morphologically derived forms of the input sequence as possible, albeit more "costly" paths that the decoder may select, we find that significant gains (10% BLEU relative) can be attained when translating from Czech, a language with considerable inflectional complexity, into English.

...read moreread less