Sub-Sentential Alignment Using Substring Co-Occurrence Counts
Fabien Cromieres
- 20 Jul 2006
- pp 13-18
TL;DR: An efficient method to compute the co-occurrence counts of any pair of substring in a parallel corpus and an algorithm that make use of these counts to create sub-sentential alignments on such a corpus are presented.
read more
Abstract: In this paper, we will present an efficient method to compute the co-occurrence counts of any pair of substring in a parallel corpus, and an algorithm that make use of these counts to create sub-sentential alignments on such a corpus. This algorithm has the advantage of being as general as possible regarding the segmentation of text.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Proceedings Article
Machine Translation without Words through Substring Alignment
Graham Neubig,Taro Watanabe,Shinsuke Mori,Tatsuya Kawahara +3 more
- 08 Jul 2012
TL;DR: This paper demonstrates that accurate machine translation is possible without the concept of "words," treating MT as a problem of transformation between character strings, and proposes a look-ahead parsing algorithm and substring-informed prior probabilities to achieve more effective and efficient alignment.
An Alignment Algorithm Using Belief Propagation and a Structure-Based Distortion Model
Fabien Cromieres,Sadao Kurohashi +1 more
- 30 Mar 2009
TL;DR: This paper uses the interest of the Loopy Belief Propagation algorithm to train and use a simple alignment model where the expected marginal values needed for an efficient EM-training are not easily computable, and improves this model with a distortion model based on structure conservation.
23
Substring-based machine translation
TL;DR: This paper demonstrates that the traditional framework of phrase-based machine translation sees large gains in accuracy over character-based translation with more naive alignment methods, and achieves comparable results to word-basedtranslation for two distant language pairs.
12
The contribution of the notion of hapax legomena to word alignment
Adrien Lardilleux,Yves Lepage +1 more
- 01 Oct 2007
TL;DR: This paper shows that, in particular, the notion of hapax legomena may contribute to word alignment to a large extent and justifies a practical and common simplification of a standard alignment method.
11
•Dissertation
Contribution des basses fréquences à l'alignement sous-phrastique multilingue : une approche différentielle
Adrien Lardilleuxl
- 01 Jan 2010
TL;DR: In this paper, a methode d'alignement sous-phrastique multilingue is proposed, which permits le traitement simultane d'un nombre quelconque de langues.
9
References
Suffix arrays: a new method for on-line string searches
Udi Manber,Gene Myers +1 more
TL;DR: A new and conceptually simple data structure, called a suffixarray, for on-line string searches is introduced in this paper, and it is believed that suffixarrays will prove to be better in practice than suffixtrees for many applications.
2.4K
A Phrase-Based,Joint Probability Model for Statistical Machine Translation
Daniel Marcu,Daniel Wong +1 more
- 06 Jul 2002
TL;DR: A joint probability model for statistical machine translation is presented, which automatically learns word and phrase equivalents from bilingual corpora, which is more accurate than translations produced using IBM Model 4.
735
•Proceedings Article
Improved Alignment Models for Statistical Machine Translation
Franz Josef Och,Christoph Tillmann,Hermann Ney +2 more
- 01 Jan 1999
TL;DR: Improved alignment models for statistical machine translation are described and experimental results are presented using the Verbmobil task (German-English, 6000word vocabulary) which is a limited-domain spoken-language task.
614
A Word-to-Word Model of Translational Equivalence
I. Dan Melamed
- 07 Jul 1997
TL;DR: This paper proposed a fast algorithm for estimating a partial translation model, which accounts for translational equivalence only at the word level, which makes the model more suitable for applications that are not fully statistical.
123
Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases
Chris Callison-Burch,Colin Bannard,Josh Schroeder +2 more
- 25 Jun 2005
TL;DR: A novel data structure for phrase-based statistical machine translation which allows for the retrieval of arbitrarily long phrases while simultaneously using less memory than is required by current decoder implementations is described.
Related Papers (5)
Daniel Marcu,Daniel Wong +1 more
- 06 Jul 2002
Orgad Keller,Tsvi Kopelowitz,Shir Landau,Moshe Lewenstein +3 more
- 18 Jun 2009
Philip Bille,Inge Li Gørtz +1 more
- 27 Jun 2011