Pivot language

Topic Tools

Papers published on a yearly basis

Papers

Proceedings Article•10.3115/1557769.1557821•

Moses: Open Source Toolkit for Statistical Machine Translation

[...]

Philipp Koehn¹, Hieu Hoang¹, Alexandra Birch¹, Chris Callison-Burch¹, Marcello Federico, Nicola Bertoldi, Brooke Cowan², Wade Shen², C. Corbett Moran², Richard Zens³, Chris Dyer⁴, Ondrej Bojar⁵, Alexandra Elena Constantin⁶, Evan Herbst⁷ - Show less +10 more•Institutions (7)

University of Edinburgh¹, Massachusetts Institute of Technology², RWTH Aachen University³, University of Maryland, College Park⁴, Charles University in Prague⁵, Williams College⁶, Cornell University⁷

25 Jun 2007

TL;DR: An open-source toolkit for statistical machine translation whose novel contributions are support for linguistically motivated factors, confusion network decoding, and efficient data formats for translation models and language models.

...read moreread less

Abstract: We describe an open-source toolkit for statistical machine translation whose novel contributions are (a) support for linguistically motivated factors, (b) confusion network decoding, and (c) efficient data formats for translation models and language models. In addition to the SMT decoder, the toolkit also includes a wide variety of tools for training, tuning and applying the system to many translation tasks.

...read moreread less

6,378 citations

Proceedings Article•10.3115/1073445.1073462•

Statistical phrase-based translation

[...]

Philipp Koehn¹, Franz Josef Och¹, Daniel Marcu¹•Institutions (1)

University of Southern California¹

27 May 2003

TL;DR: The empirical results suggest that the highest levels of performance can be obtained through relatively simple means: heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translation.

...read moreread less

Abstract: We propose a new phrase-based translation model and decoding algorithm that enables us to evaluate and compare several, previously proposed phrase-based translation models. Within our framework, we carry out a large number of experiments to understand better and explain why phrase-based models out-perform word-based models. Our empirical results, which hold for all examined language pairs, suggest that the highest levels of performance can be obtained through relatively simple means: heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translations. Surprisingly, learning phrases longer than three words and learning phrases from high-accuracy word-level alignment models does not have a strong impact on performance. Learning only syntactically motivated phrases degrades the performance of our systems.

...read moreread less

4,102 citations

Europarl: A Parallel Corpus for Statistical Machine Translation

[...]

Philipp Koehn

13 Sep 2005

TL;DR: A corpus of parallel text in 11 languages from the proceedings of the European Parliament is collected and its acquisition and application as training data for statistical machine translation (SMT) is focused on.

...read moreread less

Abstract: We collected a corpus of parallel text in 11 languages from the proceedings of the European Parliament, which are published on the web. This corpus has found widespread use in the NLP community. Here, we focus on its acquisition and its application as training data for statistical machine translation (SMT). We trained SMT systems for 110 language pairs, which reveal interesting clues into the challenges ahead.

...read moreread less

3,981 citations

Journal Article•10.1007/S10590-008-9041-6•

Pivot Language Approach for Phrase-Based Statistical Machine Translation

[...]

Hua Wu¹, Haifeng Wang¹•Institutions (1)

Toshiba¹

1 Sep 2007

TL;DR: This paper proposes a novel method for phrase-based statistical machine translation based on the use of a pivot language, using BLEU as a metric, that significantly outperforms the standard model trained on a small bilingual corpus.

...read moreread less

Abstract: This paper proposes a novel method for phrase-based statistical machine translation based on the use of a pivot language. To translate between languages L s and L t with limited bilingual resources, we bring in a third language, L p , called the pivot language. For the language pairs L s ? L p and L p ? L t , there exist large bilingual corpora. Using only L s ? L p and L p ? L t bilingual corpora, we can build a translation model for L s ? L t . The advantage of this method lies in the fact that we can perform translation between L s and L t even if there is no bilingual corpus available for this language pair. Using BLEU as a metric, our pivot language approach significantly outperforms the standard model trained on a small bilingual corpus. Moreover, with a small L s ? L t bilingual corpus available, our method can further improve translation quality by using the additional L s ? L p and L p ? L t bilingual corpora.

...read moreread less

271 citations

Proceedings Article•

A Comparison of Pivot Methods for Phrase-Based Statistical Machine Translation

[...]

Masao Utiyama¹, Hitoshi Isahara¹•Institutions (1)

National Institute of Information and Communications Technology¹

1 Apr 2007

TL;DR: The phrase translation strategy significantly outperformed the sentence translation strategy and its relative performance was 0.92 to 0.97 compared to directly trained SMT systems.

...read moreread less

Abstract: We compare two pivot strategies for phrase-based statistical machine translation (SMT), namely phrase translation and sentence translation. The phrase translation strategy means that we directly construct a phrase translation table (phrase-table) of the source and target language pair from two phrase-tables; one constructed from the source language and English and one constructed from English and the target language. We then use that phrase-table in a phrase-based SMT system. The sentence translation strategy means that we first translate a source language sentence into n English sentences and then translate these n sentences into target language sentences separately. Then, we select the highest scoring sentence from these target sentences. We conducted controlled experiments using the Europarl corpus to evaluate the performance of these pivot strategies as compared to directly trained SMT systems. The phrase translation strategy significantly outperformed the sentence translation strategy. Its relative performance was 0.92 to 0.97 compared to directly trained SMT systems.

...read moreread less

232 citations

...

Expand

Year	Papers
2022	1
2021	16
2020	12
2019	21
2018	14
2017	15

Topic Tools

Papers published on a yearly basis

Papers

Moses: Open Source Toolkit for Statistical Machine Translation

Statistical phrase-based translation

Europarl: A Parallel Corpus for Statistical Machine Translation

Pivot Language Approach for Phrase-Based Statistical Machine Translation

A Comparison of Pivot Methods for Phrase-Based Statistical Machine Translation

Related Topics (5)

Performance Metrics