Scispace (Formerly Typeset)
  1. Home
  2. Conferences
  3. Workshop on Statistical Machine Translation
  4. 2007
  1. Home
  2. Conferences
  3. Workshop on Statistical Machine Translation
  4. 2007
Showing papers presented at "Workshop on Statistical Machine Translation in 2007"
Proceedings Article•10.3115/1626355.1626389•
METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments

[...]

Alon Lavie1, Abhaya Agarwal1•
Carnegie Mellon University1
23 Jun 2007
TL;DR: The technical details underlying the Meteor metric are recapped, the latest release includes improved metric parameters and extends the metric to support evaluation of MT output in Spanish, French and German, in addition to English.
Abstract: Meteor is an automatic metric for Machine Translation evaluation which has been demonstrated to have high levels of correlation with human judgments of translation quality, significantly outperforming the more commonly used Bleu metric. It is one of several automatic metrics used in this year's shared task within the ACL WMT-07 workshop. This paper recaps the technical details underlying the metric and describes recent improvements in the metric. The latest release includes improved metric parameters and extends the metric to support evaluation of MT output in Spanish, French and German, in addition to English.

1,384 citations

Proceedings Article•10.3115/1626355.1626387•
Statistical Post-Editing on SYSTRAN's Rule-Based Translation System

[...]

Lo"ic Dugast, Jean Senellart, Philipp Koehn1•
University of Edinburgh1
23 Jun 2007
TL;DR: This article describes the combination of a SYSTRAN system with a "statistical post-editing" (SPE) system and documents qualitative analysis on two experiments performed in the shared task of the ACL 2007 Workshop on Statistical Machine Translation.
Abstract: This article describes the combination of a SYSTRAN system with a "statistical post-editing" (SPE) system. We document qualitative analysis on two experiments performed in the shared task of the ACL 2007 Workshop on Statistical Machine Translation. Comparative results and more integrated "hybrid" techniques are discussed.

150 citations

Proceedings Article•10.3115/1626355.1626393•
Linguistic Features for Automatic Evaluation of Heterogenous MT Systems

[...]

Jesús Giménez1, Lluís Màrquez1•
Polytechnic University of Catalonia1
23 Jun 2007
TL;DR: Experimental results are provided showing that metrics based on deeper linguistic information (syntactic/shallow-semantic) are able to produce more reliable system rankings than metricsbased on lexical matching alone, specially when the systems under evaluation are of a different nature.
Abstract: Evaluation results recently reported by Callison-Burch et al. (2006) and Koehn and Monz (2006), revealed that, in certain cases, the BLEU metric may not be a reliable MT quality indicator. This happens, for instance, when the systems under evaluation are based on different paradigms, and therefore, do not share the same lexicon. The reason is that, while MT quality aspects are diverse, BLEU limits its scope to the lexical dimension. In this work, we suggest using metrics which take into account linguistic features at more abstract levels. We provide experimental results showing that metrics based on deeper linguistic information (syntactic/shallow-semantic) are able to produce more reliable system rankings than metrics based on lexical matching alone, specially when the systems under evaluation are of a different nature.

135 citations

Proceedings Article•10.3115/1626355.1626366•
Using Word-Dependent Transition Models in HMM-Based Word Alignment for Statistical Machine Translation

[...]

Xiaodong He1•
Microsoft1
23 Jun 2007
TL;DR: A Bayesian Learning based method to train word dependent transition models for HMM based word alignment gives consistent and significant alignment error rate (AER) reduction and machine translation results show that word alignment can be used in a phrase-based machine translation system.
Abstract: In this paper, we present a Bayesian Learning based method to train word dependent transition models for HMM based word alignment. We present word alignment results on the Canadian Hansards corpus as compared to the conventional HMM and IBM model 4. We show that this method gives consistent and significant alignment error rate (AER) reduction. We also conducted machine translation (MT) experiments on the Europarl corpus. MT results show that word alignment based on this method can be used in a phrase-based machine translation system to yield up to 1% absolute improvement in BLEU score, compared to a conventional HMM, and 0.8% compared to a IBM model 4 based word alignment.

105 citations

Proceedings Article•10.3115/1626355.1626359•
Exploring Different Representational Units in English-to-Turkish Statistical Machine Translation

[...]

Kemal Oflazer1, Ilknur Durgar El-Kahlout•
Carnegie Mellon University1
23 Jun 2007
TL;DR: This work investigates different representational granularities for sub-lexical representation in statistical machine translation work from English to Turkish and finds that representing both Turkish and English at the morpheme-level but with some selective morpheMe-grouping on the Turkish side of the training data provides a non-trivial improvement over a fully word-based baseline.
Abstract: We investigate different representational granularities for sub-lexical representation in statistical machine translation work from English to Turkish. We find that (i) representing both Turkish and English at the morpheme-level but with some selective morpheme-grouping on the Turkish side of the training data, (ii) augmenting the training data with "sentences" comprising only the content words of the original training data to bias root word alignment, (iii) reranking the n-best morpheme-sequence outputs of the decoder with a word-based language model, and (iv) using model iteration all provide a non-trivial improvement over a fully word-based baseline. Despite our very limited training data, we improve from 20.22 BLEU points for our simplest model to 25.08 BLEU points for an improvement of 4.86 points or 24% relative.

105 citations

Proceedings Article•10.3115/1626355.1626377•
Domain Adaptation in Statistical Machine Translation with Mixture Modelling

[...]

Jorge Civera1, Alfons Juan1•
Polytechnic University of Valencia1
23 Jun 2007
TL;DR: This paper describes a mixture extension of the HMM alignment model and the derivation of Viterbi alignments to feed a state-of-the-art phrase-based system in statistical machine translation.
Abstract: Mixture modelling is a standard technique for density estimation, but its use in statistical machine translation (SMT) has just started to be explored. One of the main advantages of this technique is its capability to learn specific probability distributions that better fit subsets of the training dataset. This feature is even more important in SMT given the difficulties to translate polysemic terms whose semantic depends on the context in which that term appears. In this paper, we describe a mixture extension of the HMM alignment model and the derivation of Viterbi alignments to feed a state-of-the-art phrase-based system. Experiments carried out on the Europarl and News Commentary corpora show the potential interest and limitations of mixture modelling.

90 citations

Proceedings Article•10.3115/1626355.1626369•
Labelled Dependencies in Machine Translation Evaluation

[...]

Karolina Owczarzak1, Josef van Genabith1, Andy Way1•
Dublin City University1
23 Jun 2007
TL;DR: A dependency-based method for evaluating the quality of Machine Translation output, using labelled dependencies produced by a Lexical-Functional Grammar (LFG) parser, which reaches high correlation with human scores.
Abstract: We present a method for evaluating the quality of Machine Translation (MT) output, using labelled dependencies produced by a Lexical-Functional Grammar (LFG) parser. Our dependency-based method, in contrast to most popular string-based evaluation metrics, does not unfairly penalize perfectly valid syntactic variations in the translation, and the addition of WordNet provides a way to accommodate lexical variation. In comparison with other metrics on 16,800 sentences of Chinese-English newswire text, our method reaches high correlation with human scores.

64 citations

Proceedings Article•10.3115/1626355.1626390•
English-to-Czech Factored Machine Translation

[...]

Ondřej Bojar
23 Jun 2007
TL;DR: Experimental results demonstrate significant improvement of translation quality in terms of BLEU.
Abstract: This paper describes experiments with English-to-Czech phrase-based machine translation. Additional annotation of input and output tokens (multiple factors) is used to explicitly model morphology. We vary the translation scenario (the setup of multiple factors) and the amount of information in the morphological tags. Experimental results demonstrate significant improvement of translation quality in terms of BLEU.

52 citations

Proceedings Article•10.3115/1626355.1626374•
Context-aware Discriminative Phrase Selection for Statistical Machine Translation

[...]

Jesús Giménez1, Lluís Màrquez1•
Polytechnic University of Catalonia1
23 Jun 2007
TL;DR: Inspired by common techniques used in Word Sense Disambiguation, classifiers based on local context to predict possible phrase translations are trained and a significant improvement is obtained in adequacy.
Abstract: In this work we revise the application of discriminative learning to the problem of phrase selection in Statistical Machine Translation. Inspired by common techniques used in Word Sense Disambiguation, we train classifiers based on local context to predict possible phrase translations. Our work extends that of Vickrey et al. (2005) in two main aspects. First, we move from word translation to phrase translation. Second, we move from the 'blank-filling' task to the 'full translation' task. We report results on a set of highly frequent source phrases, obtaining a significant improvement, specially with respect to adequacy, according to a rigorous process of manual evaluation.

49 citations

Proceedings Article•10.3115/1626355.1626368•
Human Evaluation of Machine Translation Through Binary System Comparisons

[...]

David Vilar1, Gregor Leusch1, Hermann Ney1, Rafael E. Banchs2•
RWTH Aachen University1, Polytechnic University of Catalonia2
23 Jun 2007
TL;DR: It is shown how confidence ranges for state-of-the-art evaluation measures such as WER and TER can be computed accurately and efficiently without having to resort to Monte Carlo estimates.
Abstract: We introduce a novel evaluation scheme for the human evaluation of different machine translation systems. Our method is based on direct comparison of two sentences at a time by human judges. These binary judgments are then used to decide between all possible rankings of the systems. The advantages of this new method are the lower dependency on extensive evaluation guidelines, and a tighter focus on a typical evaluation task, namely the ranking of systems. Furthermore we argue that machine translation evaluations should be regarded as statistical processes, both for human and automatic evaluation. We show how confidence ranges for state-of-the-art evaluation measures such as WER and TER can be computed accurately and efficiently without having to resort to Monte Carlo estimates. We give an example of our new evaluation scheme, as well as a comparison with classical automatic and human evaluation on data from a recent international evaluation campaign.

40 citations

Proceedings Article•10.3115/1626355.1626391•
Sentence level machine translation evaluation as a ranking problem: one step aside from BLEU

[...]

Yang Ye1, Ming Zhou2, Chin-Yew Lin2•
University of Michigan1, Microsoft2
23 Jun 2007
TL;DR: This paper proposed formulating MT evaluation as a ranking problem, as is often done in the practice of assessment by human, and investigated the relative utility of several features under the ranking scenario.
Abstract: The paper proposes formulating MT evaluation as a ranking problem, as is often done in the practice of assessment by human. Under the ranking scenario, the study also investigates the relative utility of several features. The results show greater correlation with human assessment at the sentence level, even when using an n-gram match score as a baseline feature. The feature contributing the most to the rank order correlation between automatic ranking and human assessment was the dependency structure relation rather than BLEU score and reference language model feature.
Proceedings Article•10.3115/1626355.1626392•
Localization of Difficult-to-Translate Phrases

[...]

Behrang Mohit1, Rebecca Hwa1•
University of Pittsburgh1
23 Jun 2007
TL;DR: It is verified that by isolating difficult-to-translate phrases and processing them as special cases, their negative impact on the translation of the rest of the sentences can be reduced.
Abstract: This paper studies the impact that difficult-to-translate source-language phrases might have on the machine translation process. We formulate the notion of difficulty as a measurable quantity; we show that a classifier can be trained to predict whether a phrase might be difficult to translate; and we develop a framework that makes use of the classifier and external resources (such as human translators) to improve the overall translation quality. Through experimental work, we verify that by isolating difficult-to-translate phrases and processing them as special cases, their negative impact on the translation of the rest of the sentences can be reduced.
Proceedings Article•
Multi-Engine Machine Translation with an Open-Source SMT Decoder

[...]

Yu Chen, Andreas Eisele, Christian Federmann, Eva Hasler, Michael Jellinghaus, Silke Theison 
1 Jun 2007
TL;DR: This work uses a variant of standard SMT technology to align translations from one or more RBMT systems with the source text and incorporates phrases extracted from these alignments into the phrase table of the SMT system.
Abstract: We describe an architecture that allows to combine statistical machine translation (SMT) with rule-based machine translation (RBMT) in a multi-engine setup. We use a variant of standard SMT technology to align translations from one or more RBMT systems with the source text. We incorporate phrases extracted from these alignments into the phrase table of the SMT system and use the open-source decoder Moses to find good combinations of phrases from SMT training data with the phrases derived from RBMT. First experiments based on this hybrid architecture achieve promising results.
Proceedings Article•10.3115/1626355.1626370•
An Iteratively-Trained Segmentation-Free Phrase Translation Model for Statistical Machine Translation

[...]

Robert C. Moore1, Chris Quirk1•
Microsoft1
23 Jun 2007
TL;DR: A new iteratively-trained phrase translation model is proposed that produces translations of quality equal to or better than those produced by Koehn, et al.'s model, and translation quality degrades much more slowly as pruning is tightend to reduce translation time.
Abstract: Attempts to estimate phrase translation probablities for statistical machine translation using iteratively-trained models have repeatedly failed to produce translations as good as those obtained by estimating phrase translation probablities from surface statistics of bilingual word alignments as described by Koehn, et al. (2003). We propose a new iteratively-trained phrase translation model that produces translations of quality equal to or better than those produced by Koehn, et al.'s model. Moreover, with the new model, translation quality degrades much more slowly as pruning is tightend to reduce translation time.
Proceedings Article•10.3115/1626355.1626381•
Multi-engine machine translation with an open-source decoder for statistical machine translation

[...]

Yu Chen1, Andreas Eisele1, Christian Federmann, Eva Hasler2, Michael Jellinghaus1, Silke Theison1 •
Saarland University1, University of Cologne2
23 Jun 2007
TL;DR: This work uses a variant of standard SMT technology to align translations from one or more RBMT systems with the source text and incorporates phrases extracted from these alignments into the phrase table of the SMT system.
Abstract: We describe an architecture that allows to combine statistical machine translation (SMT) with rule-based machine translation (RBMT) in a multi-engine setup. We use a variant of standard SMT technology to align translations from one or more RBMT systems with the source text. We incorporate phrases extracted from these alignments into the phrase table of the SMT system and use the open-source decoder Moses to find good combinations of phrases from SMT training data with the phrases derived from RBMT. First experiments based on this hybrid architecture achieve promising results.
Proceedings Article•10.3115/1626355.1626378•
Getting to Know Moses: Initial Experiments on German-English Factored Translation

[...]

Maria Holmqvist1, Sara Stymne1, Lars Ahrenberg1•
Linköping University1
23 Jun 2007
TL;DR: The paper is based on the idea of using an off-the-shelf parser to supply linguistic information to a factored translation model and compares the results of German---English translation to the shared task baseline system based on word form.
Abstract: We present results and experiences from our experiments with phrase-based statistical machine translation using Moses. The paper is based on the idea of using an off-the-shelf parser to supply linguistic information to a factored translation model and compare the results of German---English translation to the shared task baseline system based on word form. We report partial results for this model and results for two simplified setups. Our best setup takes advantage of the parser's lemmatization and decompounding. A qualitative analysis of compound translation shows that decompounding improves translation quality.
Proceedings Article•10.3115/1626355.1626375•
Ngram-Based Statistical Machine Translation Enhanced with Multiple Weighted Reordering Hypotheses

[...]

Marta R. Costa-jussià1, Josep Maria Crego1, Patrik Lambert1, Maxim Khalilov1, José A. R. Fonollosa1, José B. Mariño1, Rafael E. Banchs1 •
Polytechnic University of Catalonia1
23 Jun 2007
TL;DR: This 2007 Ngram-based statistical machine translation system developed at the TALP Research Center of the UPC (Universitat Politecnica de Catalunya) in Barcelona introduces a target language model based on statistical classes, a feature for out-of-domain units and an improved optimization procedure.
Abstract: This paper describes the 2007 Ngram-based statistical machine translation system developed at the TALP Research Center of the UPC (Universitat Politecnica de Catalunya) in Barcelona. Emphasis is put on improvements and extensions of the previous years system, being highlyghted and empirically compared. Mainly, these include a novel word ordering strategy based on: (1) statistically monotonizing the training source corpus and (2) a novel reordering approach based on weighted reordering graphs. In addition, this system introduces a target language model based on statistical classes, a feature for out-of-domain units and an improved optimization procedure. The paper provides details of this system participation in the ACL 2007 SECOND WORKSHOP ON STATISTICAL MACHINE TRANSLATION. Results on three pairs of languages are reported, namely from Spanish, French and German into English (and the other way round) for both the in-domain and out-of-domain tasks.
Proceedings Article•10.3115/1626355.1626382•
The ISL Phrase-Based MT System for the 2007 ACL Workshop on Statistical Machine Translation

[...]

Matthias Paulik1, Kay Rottmann2, Jan Niehues2, Silja Hildebrand1, Stephan Vogel1 •
Carnegie Mellon University1, Karlsruhe Institute of Technology2
23 Jun 2007
TL;DR: This paper presents results for a system combination of the ISL syntax-augmented MT system and theISL phrase-based system by combining and rescoring the n-best lists of the two systems.
Abstract: In this paper we describe the Interactive Systems Laboratories (ISL) phrase-based machine translation system used in the shared task "Machine Translation for European Languages" of the ACL 2007 Workshop on Statistical Machine Translation. We present results for a system combination of the ISL syntax-augmented MT system and the ISL phrase-based system by combining and rescoring the n-best lists of the two systems. We also investigate the combination of two of our phrase-based systems translating from different source languages, namely Spanish and German, into their common target language, English.
Proceedings Article•10.3115/1626355.1626358•
Integration of an Arabic Transliteration Module into a Statistical Machine Translation System

[...]

Mehdi M. Kashani1, Eric Joanis, Roland Kuhn, George Foster, Fred Popowich1 •
Simon Fraser University1
23 Jun 2007
TL;DR: The experiments show that a transliteration module can help significantly in the situation where the test data is rich with previously unseen named entities and the improvement that can be attributed to the integration using the BLEU metric is evaluated.
Abstract: We provide an in-depth analysis of the integration of an Arabic-to-English transliteration system into a general-purpose phrase-based statistical machine translation system. We study the integration from different aspects and evaluate the improvement that can be attributed to the integration using the BLEU metric. Our experiments show that a transliteration module can help significantly in the situation where the test data is rich with previously unseen named entities. We obtain 70% and 53% of the theoretical maximum improvement we could achieve, as measured by an oracle on development and test sets respectively for OOV words (out of vocabulary source words not appearing in the phrase table).
Proceedings Article•10.3115/1626355.1626365•
Training Non-Parametric Features for Statistical Machine Translation

[...]

Patrick Nguyen1, Milind Mahajan1, Xiaodong He1•
Microsoft1
23 Jun 2007
TL;DR: This paper proposes to relax the linearity constraints on the combination, and hence relaxing constraints of monotonicity and independence of feature functions, and expands features into a non-parametric, non-linear, and high-dimensional space.
Abstract: Modern statistical machine translation systems may be seen as using two components: feature extraction, that summarizes information about the translation, and a log-linear framework to combine features. In this paper, we propose to relax the linearity constraints on the combination, and hence relaxing constraints of monotonicity and independence of feature functions. We expand features into a non-parametric, non-linear, and high-dimensional space. We extend empirical Bayes reward training of model parameters to meta parameters of feature generation. In effect, this allows us to trade away some human expert feature design for data. Preliminary results on a standard task show an encouraging improvement.
Proceedings Article•10.3115/1626355.1626376•
Analysis of Statistical and Morphological Classes to Generate Weigthed Reordering Hypotheses on a Statistical Machine Translation System

[...]

Marta R. Costa-jussià1, José A. R. Fonollosa1•
Polytechnic University of Catalonia1
23 Jun 2007
TL;DR: This paper experiments with different graph pruning which guarantees the translation quality improvement due to reordering at a very low increase of computational cost.
Abstract: One main challenge of statistical machine translation (SMT) is dealing with word order. The main idea of the statistical machine reordering (SMR) approach is to use the powerful techniques of SMT systems to generate a weighted reordering graph for SMT systems. This technique supplies reordering constraints to an SMT system, using statistical criteria. In this paper, we experiment with different graph pruning which guarantees the translation quality improvement due to reordering at a very low increase of computational cost. The SMR approach is capable of generalizing reorderings, which have been learned during training, by using word classes instead of words themselves. We experiment with statistical and morphological classes in order to choose those which capture the most probable reorderings. Satisfactory results are reported in the WMT07 Es/En task. Our system outperforms in terms of BLEU the WMT07 Official baseline system.
Proceedings Article•10.3115/1626355.1626385•
UCB System Description for the WMT 2007 Shared Task

[...]

Preslav Nakov1, Marti A. Hearst1•
University of California, Berkeley1
23 Jun 2007
TL;DR: For the WMT 2007 shared task, the UC Berkeley team employed monolingual syntactic paraphrases to provide syntactic variety to the source training set sentences and made use of results from prior research that shows that cognate pairs can improve word alignments.
Abstract: For the WMT 2007 shared task, the UC Berkeley team employed three techniques of interest. First, we used monolingual syntactic paraphrases to provide syntactic variety to the source training set sentences. Second, we trained two language models: a small in-domain model and a large out-of-domain model. Finally, we made use of results from prior research that shows that cognate pairs can improve word alignments. We contributed runs translating English to Spanish, French, and German using various combinations of these techniques.
Proceedings Article•10.3115/1626355.1626380•
Building a Statistical Machine Translation System for French Using the Europarl Corpus

[...]

Holger Schwenk1•
Centre national de la recherche scientifique1
23 Jun 2007
TL;DR: This paper describes the development of a statistical machine translation system based on the Moses decoder for the 2007 WMT shared tasks and uses a statistical language model that is based on a continuous representation of the words in the vocabulary.
Abstract: This paper describes the development of a statistical machine translation system based on the Moses decoder for the 2007 WMT shared tasks. Several different translation strategies were explored. We also use a statistical language model that is based on a continuous representation of the words in the vocabulary. By these means we expect to take better advantage of the limited amount of training data. Finally, we have investigated the usefulness of a second reference translation of the development data.
Proceedings Article•10.3115/1626355.1626386•
The Syntax Augmented MT (SAMT) System at the Shared Task for the 2007 ACL Workshop on Statistical Machine Translation

[...]

Andreas Zollmann1, Ashish Venugopal1, Matthias Paulik1, Stephan Vogel1•
Carnegie Mellon University1
23 Jun 2007
TL;DR: Parameters for components in the open-source SAMT toolkit that were used to generate translation results for the Spanish to English in-domain track of the shared task are described and relative performance is discussed against the authors' phrase-based submission.
Abstract: We describe the CMU-UKA Syntax Augmented Machine Translation system 'SAMT' used for the shared task "Machine Translation for European Languages" at the ACL 2007 Workshop on Statistical Machine Translation. Following an overview of syntax augmented machine translation, we describe parameters for components in our open-source SAMT toolkit that were used to generate translation results for the Spanish to English in-domain track of the shared task and discuss relative performance against our phrase-based submission.
Proceedings Article•10.3115/1626355.1626364•
Meta-Structure Transformation Model for Statistical Machine Translation

[...]

Jiadong Sun1, Tiejun Zhao1, Huashen Liang1•
Harbin Institute of Technology1
23 Jun 2007
TL;DR: A novel syntax-based model for statistical machine translation in which meta-structure (MS) and meta-Structure sequence (SMS) of a parse tree are defined and significantly outperforms Pharaoh, a state-art-the-art phrase-based system.
Abstract: We propose a novel syntax-based model for statistical machine translation in which meta-structure (MS) and meta-structure sequence (SMS) of a parse tree are defined. In this framework, a parse tree is decomposed into SMS to deal with the structure divergence and the alignment can be reconstructed at different levels of recombination of MS (RM). RM pairs extracted can perform the mapping between the sub-structures across languages. As a result, we have got not only the translation for the target language, but an SMS of its parse tree at the same time. Experiments with BLEU metric show that the model significantly outperforms Pharaoh, a state-art-the-art phrase-based system.
Proceedings Article•10.3115/1626355.1626363•
Speech-Input Multi-Target Machine Translation

[...]

Alicia Pérez1, M. Teresa González2, M. Inés Torres1, Francisco Casacuberta2•
University of the Basque Country1, University of Valencia2
23 Jun 2007
TL;DR: The multi-target model has been evaluated in a practical situation, and the results have been compared with those obtained using several mono-target models, showing that the multi- target one requires less amount of memory.
Abstract: In order to simultaneously translate speech into multiple languages an extension of stochastic finite-state transducers is proposed. In this approach the speech translation model consists of a single network where acoustic models (in the input) and the multilingual model (in the output) are embedded. The multi-target model has been evaluated in a practical situation, and the results have been compared with those obtained using several mono-target models. Experimental results show that the multi-target one requires less amount of memory. In addition, a single decoding is enough to get the speech translated into multiple languages.
Proceedings Article•10.3115/1626355.1626372•
Mixture-Model Adaptation for SMT

[...]

George Foster1, Roland Kuhn1•
National Research Council1
23 Jun 2007
TL;DR: A mixture-model approach to adapting a Statistical Machine Translation System for new domains, using weights that depend on text distances to mixture components to achieve gains of approximately one BLEU percentage point over a state-of-the art non-adapted baseline system is described.
Abstract: We describe a mixture-model approach to adapting a Statistical Machine Translation System for new domains, using weights that depend on text distances to mixture components. We investigate a number of variants on this approach, including cross-domain versus dynamic adaptation; linear versus loglinear mixtures; language and translation model adaptation; different methods of assigning weights; and granularity of the source unit being adapted to. The best methods achieve gains of approximately one BLEU percentage point over a state-of-the art non-adapted baseline system.
Proceedings Article•10.3115/1626355.1626360•
Can We Translate Letters

[...]

David Vilar1, Jan-Thorsten Peter1, Hermann Ney1•
RWTH Aachen University1
23 Jun 2007
TL;DR: This work tries to find out if a nearly unmodified state-of-the-art translation system is able to cope with the problem and whether it is capable to further generalize translation rules, for example at the level of word suffixes and translation of unseen words.
Abstract: Current statistical machine translation systems handle the translation process as the transformation of a string of symbols into another string of symbols. Normally the symbols dealt with are the words in different languages, sometimes with some additional information included, like morphological data. In this work we try to push the approach to the limit, working not on the level of words, but treating both the source and target sentences as a string of letters. We try to find out if a nearly unmodified state-of-the-art translation system is able to cope with the problem and whether it is capable to further generalize translation rules, for example at the level of word suffixes and translation of unseen words. Experiments are carried out for the translation of Catalan to Spanish.
Proceedings Article•10.3115/1626355.1626356•
Using Dependency Order Templates to Improve Generality in Translation

[...]

Arul Menezes1, Chris Quirk1•
Microsoft1
23 Jun 2007
TL;DR: A new reordering model based on dependency order templates is introduced, and it is shown that it outperforms both phrasal and treelet systems on in-domain and out-of-domain text, while limiting the search space.
Abstract: Today's statistical machine translation systems generalize poorly to new domains. Even small shifts can cause precipitous drops in translation quality. Phrasal systems rely heavily, for both reordering and contextual translation, on long phrases that simply fail to match out-of-domain text. Hierarchical systems attempt to generalize these phrases but their learned rules are subject to severe constraints. Syntactic systems can learn lexicalized and unlexicalized rules, but the joint modeling of lexical choice and reordering can narrow the applicability of learned rules. The treelet approach models reordering separately from lexical choice, using a discriminatively trained order model, which allows treelets to apply broadly, and has shown better generalization to new domains, but suffers a factorially large search space. We introduce a new reordering model based on dependency order templates, and show that it outperforms both phrasal and treelet systems on in-domain and out-of-domain text, while limiting the search space.
Proceedings Article•10.3115/1626355.1626384•
The "Noisier Channel": Translation from Morphologically Complex Languages

[...]

Chris Dyer1•
University of Maryland, College Park1
23 Jun 2007
TL;DR: A new paradigm for translation from inflectionally rich languages that was used in the University of Maryland statistical machine translation system for the WMT07 Shared Task is presented, and significant gains can be attained when translating from Czech, a language with considerable inflectional complexity, into English.
Abstract: This paper presents a new paradigm for translation from inflectionally rich languages that was used in the University of Maryland statistical machine translation system for the WMT07 Shared Task. The system is based on a hierarchical phrase-based decoder that has been augmented to translate ambiguous input given in the form of a confusion network (CN), a weighted finite state representation of a set of strings. By treating morphologically derived forms of the input sequence as possible, albeit more "costly" paths that the decoder may select, we find that significant gains (10% BLEU relative) can be attained when translating from Czech, a language with considerable inflectional complexity, into English.

Tools

SciSpace AgentBiomedical AgentSciSpace RecruitSciSpace for EnterpriseAgent GalleryChat with PDFLiterature ReviewAI WriterFind TopicsParaphraserCitation GeneratorExtract DataAI DetectorCitation Booster

Learn

ResourcesLive Workshops

SciSpace

CareersSupportBrowse PapersPricingSciSpace Affiliate ProgramCancellation & Refund PolicyTermsPrivacyData Sources

Directories

PapersTopicsJournalsAuthorsConferencesInstitutionsCitation StylesWriting templates

Extension & Apps

SciSpace Chrome ExtensionSciSpace Mobile App

Contact

support@scispace.com
SciSpace

© 2026 | PubGenius Inc. | Suite # 217 691 S Milpitas Blvd Milpitas CA 95035, USA

soc2
Secured by Delve