TL;DR: An important result is that refined alignment models with a first-order dependence and a fertility model yield significantly better results than simple heuristic models.
Abstract: We present and compare various methods for computing word alignments using statistical or heuristic models. We consider the five alignment models presented in Brown, Della Pietra, Della Pietra, and Mercer (1993), the hidden Markov alignment model, smoothing techniques, and refinements. These statistical models are compared with two heuristic models based on the Dice coefficient. We present different methods for combining word alignments to perform a symmetrization of directed statistical alignment models. As evaluation criterion, we use the quality of the resulting Viterbi alignment compared to a manually produced reference alignment. We evaluate the models on the German-English Verbmobil task and the French-English Hansards task. We perform a detailed analysis of various design decisions of our statistical alignment system and evaluate these on training corpora of various sizes. An important result is that refined alignment models with a first-order dependence and a fertility model yield significantly better results than simple heuristic models. In the Appendix, we present an efficient training algorithm for the alignment models presented.
TL;DR: Algorithms are designed to answer the following kinds of questions about trees: what is the distance between two trees, and the analogous question for prunings as for subtrees.
Abstract: Ordered labeled trees are trees in which the left-to-right order among siblings is significant. The distance between two ordered trees is considered to be the weighted number of edit operations (in...
TL;DR: It is shown that the first problem is NP-complete and the second is MAX SNP-hard; the complexity of tree alignment with a given phylogeny is also considered.
Abstract: We study the computational complexity of two popular problems in multiple sequence alignment: multiple alignment with SP-score and multiple tree alignment. It is shown that the first problem is NP-complete and the second is MAX SNP-hard. The complexity of tree alignment with a given phylogeny is also considered.
TL;DR: An algorithm is presented which solves the problem of determining the distance from T to T' as measured by the mlmmum cost sequence of edit operaUons needed to transform T into T'.
Abstract: The tree-to-tree correctmn problem Is to determine, for two labeled ordered trees T and T', the distance from T to T' as measured by the mlmmum cost sequence of edit operaUons needed to transform T into T' The edit operations investigated allow changing one node of a tree into another node, deleting one node from a tree, or inserting a node into a tree An algorithm Is presented which solves this problem m time O(V* V'*LZ* L'2), where V and V' are the numbers of nodes respectively of T and T', and L and L' are the maximum depths respectively of T and T' Possible apphcatmns are to the problems of measuring the similarity between trees, automatic error recovery and correction for programming languages, and determining the largest common substructure of two trees
TL;DR: In this article, the authors show how to construct sequences for all the remaining vertices simultaneously, so as to minimize the total edge-length of the tree, which is calculated by a metric whose biological significance is the mutational distance between two sequences.
Abstract: Given a finite tree, some of whose vertices are identified with given finite sequences, we show how to construct sequences for all the remaining vertices simultaneously, so as to minimize the total edge-length of the tree. Edge-length is calculated by a metric whose biological significance is the mutational distance between two sequences.