Efficient data selection for machine translation

doi:10.1109/SLT.2008.4777890

Proceedings Article10.1109/SLT.2008.4777890

Efficient data selection for machine translation

Arindam Mandal, +7 more

- 01 Dec 2008

- pp 261-264

28

TL;DR: This paper introduces two methods for efficient selection of training data to be translated by humans and shows that one-fifth of the additional training data can achieve similar or better translation performance, compared to that of using all available data.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.3115/1699571.1699605

Discriminative Corpus Weight Estimation for Machine Translation

Spyros Matsoukas, +2 more

- 06 Aug 2009

TL;DR: A novel approach for automatically detecting and down-weighing certain parts of the training corpus by assigning a weight to each sentence in the training bitext so as to optimize a discriminative objective function on a designated tuning set is described.

...read moreread less

136

•Proceedings Article

Instance Selection for Machine Translation using Feature Decay Algorithms

Ergun Bicici, +1 more

- 30 Jul 2011

TL;DR: It is shown that the feature decay rate has a very strong effect on the final translation quality whereas the initial feature values, inclusion of higher order features, or sentence length normalizations do not.

...read moreread less

79

Journal Article•10.1109/TASLP.2014.2381882

Optimizing instance selection for statistical machine translation with feature decay algorithms

Ergun Bicici, +1 more

- 01 Feb 2015

- IEEE Transactions on Audio, Speech, and ...

TL;DR: FDA5 is able to reduce the time to build a statistical machine translation system to about half with 1M words using only 3% of the space for the phrase table and 8% ofThe overall space when compared with a baseline system using all of the training data available yet still obtain only 0.58 BLEU points difference with the baseline system in out-of-domain translation.

...read moreread less

50

Journal Article•10.1007/S10590-015-9176-1

Survey of data-selection methods in statistical machine translation

Sauleh Eetemadi, +3 more

- 01 Dec 2015

- Machine Translation

TL;DR: A comparative overview of research in statistical machine translation is provided based on application scenario, feature functions and search method.

...read moreread less

46

The Regression Model of Machine Translation

Mehmet Ergun Biçici

- 01 Jan 2012

TL;DR: The results demonstrate that sparse regression models are better than L2 regularized regression for statistical machine translation in predicting target features, estimating word alignments, creating phrase tables, and generating translation outputs.

...read moreread less

29

...

Expand

References

•Proceedings Article•10.3115/1075096.1075117

Minimum Error Rate Training in Statistical Machine Translation

Franz Josef Och

- 07 Jul 2003

TL;DR: It is shown that significantly better results can often be obtained if the final evaluation criterion is taken directly into account as part of the training procedure.

...read moreread less

3.4K

•Proceedings Article

A Study of Translation Edit Rate with Targeted Human Annotation

Matthew Snover, +4 more

- 08 Aug 2006

TL;DR: A new, intuitive measure for evaluating machine translation output that avoids the knowledge intensiveness of more meaning-based approaches, and the labor-intensiveness of human judgments is defined.

...read moreread less

2.5K

•Journal Article•10.5555/92858.92860

A statistical approach to machine translation

Peter Fitzhugh Brown, +7 more

- 01 Jun 1990

- Computational Linguistics

TL;DR: The application of the statistical approach to translation from French to English and preliminary results are described and the results are given.

...read moreread less

2K

Proceedings Article•10.1145/130385.130417

Query by committee

H. S. Seung, +2 more

- 01 Jul 1992

TL;DR: It is suggested that asymptotically finite information gain may be an important characteristic of good query algorithms, in which a committee of students is trained on the same data set.

...read moreread less

2K

Patent•10.1016/J.SPECOM.2004.08.002

Combining active and semi-supervised learning for spoken language understanding

Dilek Hakkani-Tur, +2 more

- 12 Jan 2005

- Speech Communication

TL;DR: This paper combined active and semi-supervised learning to reduce the amount of manual labeling when training a spoken language understanding model classifier with human-labeled utterance data, which reduced the number of manual labels.

...read moreread less

235