Open AccessProceedings Article
Improving Statistical Machine Translation Accuracy Using Bilingual Lexicon Extractionwith Paraphrases
Chenhui Chu,Toshiaki Nakazawa,Sadao Kurohashi +2 more
- 01 Dec 2014
- pp 262-271
TL;DR: Paraphrases are used to smooth the vectors used in comparable feature estimation with BLE and improve the qual- ity of comparable features, which can improve the accuracy of the translation model thus improving SMT performance.
read more
Abstract: Statistical machine translation (SMT) suffers from theaccuracy problemthat the translation pairs and their feature scores in the transla- tion model can be inaccurate. Theaccuracy problemis caused by the quality of the unsu- pervised methods used for translation model learning. Previous studies propose estimating comparable features for the translation pairs in the translation model from comparable cor- pora, to improve the accuracy of the transla- tion model. Comparable feature estimation is based on bilingual lexicon extraction (BLE) technology. However, BLE suffers from the data sparseness problem, which makes the comparable features inaccurate. In this paper, we propose using paraphrases to address this problem. Paraphrases are used to smooth the vectors used in comparable feature estimation with BLE. In this way, we improve the qual- ity of comparable features, which can improve the accuracy of the translation model thus im- prove SMT performance. Experiments con- ducted on Chinese-English phrase-based SMT (PBSMT) verify the effectiveness of our pro- posed method.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A Survey of Orthographic Information in Machine Translation
TL;DR: A survey of research regarding orthography’s influence on machine translation of under-resourced languages is offered and how orthographic information can be utilised to improve machine translation is described.
Extracting parallel fragments from comparable documents using a generative model
TL;DR: The experimental results show significant improvement if the extracted fragments generated by the proposed method are used for augmenting an existing parallel corpus in an statistical machine translation system.
3
A Survey of Orthographic Information in Machine Translation.
Bharathi Raja Chakravarthi,Priya Rani,Mihael Arcan,John P. McCrae +3 more
- 01 Jan 2021
TL;DR: In this paper, a survey of orthographic influence on machine translation of under-resourced languages is presented, focusing on multilingual settings and bilingual lexicon induction, and a recent trend that links orthographic information with well-established machine translation methods is discussed.
Matching Graph, a Method for Extracting Parallel Information from Comparable Corpora
Somayeh Bakhshaei,Reza Safabakhsh,Shahram Khadivi +2 more
- 25 Jul 2019
TL;DR: A generative model is proposed for efficient extraction of parallel fragments from a pair of comparable documents that is a graph called the Matching Graph that can be trained on a small initial seed and shown to perform significantly better than other recently published models.
1
References
Moses: Open Source Toolkit for Statistical Machine Translation
Philipp Koehn,Hieu Hoang,Alexandra Birch,Chris Callison-Burch,Marcello Federico,Nicola Bertoldi,Brooke Cowan,Wade Shen,C. Corbett Moran,Richard Zens,Chris Dyer,Ondrej Bojar,Alexandra Elena Constantin,Evan Herbst +13 more
- 25 Jun 2007
TL;DR: An open-source toolkit for statistical machine translation whose novel contributions are support for linguistically motivated factors, confusion network decoding, and efficient data formats for translation models and language models.
•Journal Article
The mathematics of statistical machine translation: parameter estimation
TL;DR: The authors describe a series of five statistical models of the translation process and give algorithms for estimating the parameters of these models given a set of pairs of sentences that are translations of one another.
Distributional Structure
TL;DR: This discussion will discuss how each language can be described in terms of a distributional structure, i.e. in Terms of the occurrence of parts relative to other parts, and how this description is complete without intrusion of other features such as history or meaning.
4.2K
Statistical phrase-based translation
Philipp Koehn,Franz Josef Och,Daniel Marcu +2 more
- 27 May 2003
TL;DR: The empirical results suggest that the highest levels of performance can be obtained through relatively simple means: heuristic learning of phrase translations from word-based alignments and lexical weighting of phrase translation.
Distributional Structure
Zellig S. Harris
- 01 Jan 1981
TL;DR: This discussion will discuss how each language can be described in terms of a distributional structure, i.e. in Terms of the occurrence of parts relative to other parts, and how this description is complete without intrusion of other features such as history or meaning.
3.6K