TL;DR: Many multiple alignment methods implicitly or explicitly try to minimize the amount of biological change implied by an alignment, while at the level of sequences, biological change is measured a fraction of a percent.
Abstract: Many multiple alignment methods implicitly or explicitly try to minimize the amount of biological change implied by an alignment. At the level of sequences, biological change is measured a...
TL;DR: This paper describes a parallel implementation of a sequence alignment algorithm for biomolecular sequence analysis that uses multiple threaded programming for the most time consuming functions and works in X Window based interactive systems.
Abstract: This paper describes a parallel implementation of a sequence alignment algorithm for biomolecular sequence analysis. It uses multiple threaded programming for the most time consuming functions and works in X Window based interactive systems. Its sequence alignment operations include pairwise alignment, star alignment, phylogeny reconstruction and generalized tree alignment. Both of fast and optimal modes are provided. The algorithms for phylogeny reconstruction, generalized tree alignment, and tree alignment are based on heuristic stepwise addition and internal node sequence alignment induction methods. PTAR can be used for DNA, RNA, and protein sequence analysis. In general, the system can carry out the alignments for any sequences composed of characters a-z and A-Z.
TL;DR: This work examines both phylogenetic data and the structure of the search space in order to suggest methods to reduce the number of possible trees that must be examined to find an exact solution for any given set of taxa and associated character data.
Abstract: Determining the best possible evolutionary history, the lowest-cost phylogenetic tree, to fit a given set of taxa and character sequences using maximum parsimony is an active area of research due to its underlying importance in understanding biological processes. As several steps in this process are NP-Hard when using popular, biologically-motivated optimality criteria, significant amounts of resources are dedicated to both both heuristics and to making exact methods more computationally tractable. We examine both phylogenetic data and the structure of the search space in order to suggest methods to reduce the number of possible trees that must be examined to find an exact solution for any given set of taxa and associated character data. Our work on four related problems combines theoretical insight with empirical study to improve searching of the tree space. First, we show that there is a Hamiltonian path through tree space for the most common tree metrics, answering Bryant's Challenge for the minimal such path. We next examine the topology of the search space under various metrics, showing that some metrics have local maxima and minima even with "perfect" data, while some others do not. We further characterize conditions for which sequences simulated under the Jukes-Cantor model of evolution yield well-behaved search spaces. Next, we reduce the search space needed for an exact solution by splitting the set of characters into mutually-incompatible subsets of compatible characters, building trees based on the perfect phylogenies implied by these sets, and then searching in the neighborhoods of these trees. We validate this work empirically. Finally, we compare two approaches to the generalized tree alignment problem, or GTAP: Sequence alignment followed by tree search vs. Direct Optimization, on both biological and simulated data.
TL;DR: Divakaran et al. as mentioned in this paper presented constant approximation algorithms for the constrained generalized tree alignment problem, a special case of this problem, and provided a guaranteed error bound of 2-2/k.
TL;DR: This paper mimicks agglomerative clustering techniques as used for phylogenetic trees while at the same time aligning the sequences using the data structure of sequence graphs, and produces biologically meaningful answers on a set of Alu repeats.
Abstract: A formalization of the multiple sequence alignment problem that emphasizes the problem's evolutionary aspect is the Generalized Tree Alignment Problem. Given a set of sequences, this formalization asks for a phylogenetic tree and ancestral sequences such that the implied amount of change necessary to explain the given data is minimal. The problem is computationally hard and we present a heuristic algorithm for it. Our procedure mimicks agglomerative clustering techniques as used for phylogenetic trees while at the same time aligning the sequences using the data structure of sequence graphs. The approach achieves good results in terns of the underlying scoring function. It produces biologically meaningful answers which in this paper we will demonstrate on a set of Alu repeats.