Sub-Sentential Alignment Using Substring Co-Occurrence Counts

doi:10.3115/1557856.1557860

Open AccessProceedings Article10.3115/1557856.1557860

Sub-Sentential Alignment Using Substring Co-Occurrence Counts

Fabien Cromieres

- 20 Jul 2006

- pp 13-18

12

TL;DR: An efficient method to compute the co-occurrence counts of any pair of substring in a parallel corpus and an algorithm that make use of these counts to create sub-sentential alignments on such a corpus are presented.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article

Machine Translation without Words through Substring Alignment

Graham Neubig, +3 more

- 08 Jul 2012

TL;DR: This paper demonstrates that accurate machine translation is possible without the concept of "words," treating MT as a problem of transformation between character strings, and proposes a look-ahead parsing algorithm and substring-informed prior probabilities to achieve more effective and efficient alignment.

...read moreread less

46

•Proceedings Article•10.3115/1609067.1609085

An Alignment Algorithm Using Belief Propagation and a Structure-Based Distortion Model

Fabien Cromieres, +1 more

- 30 Mar 2009

TL;DR: This paper uses the interest of the Loopy Belief Propagation algorithm to train and use a simple alignment model where the expected marginal values needed for an efficient EM-training are not easily computable, and improves this model with a distortion model based on structure conservation.

...read moreread less

23

Journal Article•10.1007/S10590-013-9136-6

Substring-based machine translation

Graham Neubig, +3 more

- 01 Jun 2013

- Machine Translation

TL;DR: This paper demonstrates that the traditional framework of phrase-based machine translation sees large gains in accuracy over character-based translation with more naive alignment methods, and achieves comparable results to word-basedtranslation for two distant language pairs.

...read moreread less

12

The contribution of the notion of hapax legomena to word alignment

Adrien Lardilleux, +1 more

- 01 Oct 2007

TL;DR: This paper shows that, in particular, the notion of hapax legomena may contribute to word alignment to a large extent and justifies a practical and common simplification of a standard alignment method.

...read moreread less

11

•Dissertation

Contribution des basses fréquences à l'alignement sous-phrastique multilingue : une approche différentielle

Adrien Lardilleuxl

- 01 Jan 2010

TL;DR: In this paper, a methode d'alignement sous-phrastique multilingue is proposed, which permits le traitement simultane d'un nombre quelconque de langues.

...read moreread less

9

References

Journal Article•10.1137/0222058

Suffix arrays: a new method for on-line string searches

Udi Manber, +1 more

- 01 Oct 1993

- SIAM Journal on Computing

TL;DR: A new and conceptually simple data structure, called a suffixarray, for on-line string searches is introduced in this paper, and it is believed that suffixarrays will prove to be better in practice than suffixtrees for many applications.

...read moreread less

2.4K

•Proceedings Article•10.3115/1118693.1118711

A Phrase-Based,Joint Probability Model for Statistical Machine Translation

Daniel Marcu, +1 more

- 06 Jul 2002

TL;DR: A joint probability model for statistical machine translation is presented, which automatically learns word and phrase equivalents from bilingual corpora, which is more accurate than translations produced using IBM Model 4.

...read moreread less

735

•Proceedings Article

Improved Alignment Models for Statistical Machine Translation

Franz Josef Och, +2 more

- 01 Jan 1999

TL;DR: Improved alignment models for statistical machine translation are described and experimental results are presented using the Verbmobil task (German-English, 6000word vocabulary) which is a limited-domain spoken-language task.

...read moreread less

614

•Proceedings Article•10.3115/976909.979680

A Word-to-Word Model of Translational Equivalence

I. Dan Melamed

- 07 Jul 1997

TL;DR: This paper proposed a fast algorithm for estimating a partial translation model, which accounts for translational equivalence only at the word level, which makes the model more suitable for applications that are not fully statistical.

...read moreread less

123

•Proceedings Article•10.3115/1219840.1219872

Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases

Chris Callison-Burch, +2 more

- 25 Jun 2005

TL;DR: A novel data structure for phrase-based statistical machine translation which allows for the retrieval of arbitrarily long phrases while simultaneously using less memory than is required by current decoder implementations is described.

...read moreread less

117

Sub-Sentential Alignment Using Substring Co-Occurrence Counts

Chat with Paper

AI Agents for this Paper

Citations

Machine Translation without Words through Substring Alignment

An Alignment Algorithm Using Belief Propagation and a Structure-Based Distortion Model

Substring-based machine translation

The contribution of the notion of hapax legomena to word alignment

Contribution des basses fréquences à l'alignement sous-phrastique multilingue : une approche différentielle

References

Suffix arrays: a new method for on-line string searches

A Phrase-Based,Joint Probability Model for Statistical Machine Translation

Improved Alignment Models for Statistical Machine Translation

A Word-to-Word Model of Translational Equivalence

Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases

Related Papers (5)

The mathematics of statistical machine translation: parameter estimation

A systematic comparison of various statistical alignment models

A Phrase-Based,Joint Probability Model for Statistical Machine Translation

Generalized Substring Compression

Substring range reporting