Post-processing long pairwise alignments
TL;DR: This work develops an algorithm that decomposes a long alignment into sub-alignments that avoid potential imperfections and runs in time proportional to the original alignment's length.
read more
Abstract: Motivation: The local alignment problem for two sequences requires determining similar regions, one from each sequence, and aligning those regions. For alignments computed by dynamic programming, current approaches for selecting similar regions may have potential flaws. For instance, the criterion of Smith and Waterman can lead to inclusion of an arbitrarily poor internal segment. Other approaches can generate an alignment scoring less than some of its internal segments. Results: We develop an algorithm that decomposes a long alignment into sub-alignments that avoid these potential imperfections. Our algorithm runs in time proportional to the original alignment’s length. Practical applications to alignments of genomic DNA sequences are described. Availability: Software is available at http:// globin.cse.psu.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Human–Mouse Alignments with BLASTZ
Scott Schwartz,W. James Kent,Arian F.A. Smit,Zheng Zhang,Robert Baertsch,Ross C. Hardison,David Haussler,Webb Miller +7 more
TL;DR: This work describes BLASTZ, an independent implementation of the Gapped BLAST algorithm specifically designed for aligning two long genomic sequences, and its modifications, the hardware environment on which it is run, and several empirical studies to validate its results.
GraphAligner: rapid and versatile sequence-to-graph alignment.
TL;DR: GraphAligner is presented, a tool for aligning long reads to genome graphs and is found to be more than twice as accurate and over 12x faster than extant tools.
A new repeat-masking method enables specific detection of homologous sequences
TL;DR: This paper presents a new repeat-masking method, tantan, which is motivated by the mechanisms that create simple repeats and enables accurate homology search for non-coding DNA with extreme A + T composition.
190
Sequence and Comparative Analysis of the Mouse 1-Megabase Region Orthologous to the Human 11p15 Imprinted Domain
Patrick Onyango,Webb Miller,Jessica A. Lehoczky,Cheuk T. Leung,Bruce W. Birren,Sarah J. Wheelan,Sarah J. Wheelan,Ken Dewar,Andrew P. Feinberg +8 more
TL;DR: This study provides the first global view of the architecture of an entire imprinted domain and provides candidate sequence elements for subsequent functional analyses.
A new approach to sequence comparison: normalized sequence alignment.
TL;DR: Normalized Local Alignment (NLA) as mentioned in this paper is based on fractional programming and its running time is O(n2log n) compared to the standard Smith-Waterman algorithm.
96
References
Basic Local Alignment Search Tool
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.
98.8K
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Stephen F. Altschul,Thomas L. Madden,Alejandro A. Schäffer,Jinghui Zhang,Zheng Zhang,Webb Miller,David J. Lipman +6 more
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
A general method applicable to the search for similarities in the amino acid sequence of two proteins
TL;DR: A computer adaptable method for finding similarities in the amino acid sequences of two proteins has been developed and it is possible to determine whether significant homology exists between the proteins to trace their possible evolutionary development.
13.2K
Identification of common molecular subsequences.
TL;DR: This letter extends the heuristic homology algorithm of Needleman & Wunsch (1970) to find a pair of segments, one from each of two long sequences, such that there is no other Pair of segments with greater similarity (homology).
11.3K
Protein sequence similarity searches using patterns as seeds
Zheng Zhang,Alejandro A. Schäffer,Webb Miller,Thomas L. Madden,David J. Lipman,Eugene V. Koonin,Stephen F. Altschul +6 more
TL;DR: The pattern-hit initiated BLAST (PHI-BLAST) program described here takes as input both a protein sequence and a pattern of interest that it contains, and searches a protein database for other instances of the input pattern, and uses those found as seeds for the construction of local alignments to the query sequence.