Journal Article10.1137/S0097539794264810
Incremental String Comparison
239
TL;DR: This paper considers the following incremental version of comparing two sequences A and B to determine their longest common subsequence (LCS) or the edit distance between them, and obtains O(nk) algorithms for the longest prefix approximate match problem, the approximate overlap problem, and cyclic string comparison.
read more
Abstract: The problem of comparing two sequences A and B to determine their longest common subsequence (LCS) or the edit distance between them has been much studied. In this paper we consider the following incremental version of these problems: given an appropriate encoding of a comparison between A and B, can one incrementally compute the answer for A and bB, and the answer for A and Bb with equal efficiency, where b is an additional symbol? Our main result is a theorem exposing a surprising relationship between the dynamic programming solutions for two such "adjacent" problems. Given a threshold k on the number of differences to be permitted in an alignment, the theorem leads directly to an O(k) algorithm for incrementally computing a new solution from an old one, as contrasts the O(k2) time required to compute a solution from scratch. We further show, with a series of applications, that this algorithm is indeed more powerful than its nonincremental counterpart. We show this by solving the applications with greater asymptotic efficiency than heretofore possible. For example, we obtain O(nk) algorithms for the longest prefix approximate match problem, the approximate overlap problem, and cyclic string comparison.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A guided tour to approximate string matching
TL;DR: This work surveys the current techniques to cope with the problem of string matching that allows errors, and focuses on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms.
Sequential pattern mining -- approaches and algorithms
Carl Mooney,John F. Roddick +1 more
TL;DR: This article surveys the approaches and algorithms proposed to date in Sequential Pattern Mining, a subfield of data mining to focus on detecting and analyzing frequent subsequences in data.
362
All Highest Scoring Paths in Weighted Grid Graphs and Their Application to Finding All Approximate Repeats in Strings
TL;DR: This work builds a data structure that supports O(mn log m) time queries about the weight of any of the O(m2n) best paths from the vertices in column 0 of the graph to all other vertices, and presents a simple O(n2 log n) time and $\Theta(n^2)$ space algorithm to find all approximate tandem repeats xy within a string of size n.
156
Faster algorithms for string matching with k mismatches
TL;DR: The string matching with mismatches problem as discussed by the authors is that of finding the number of mismatches between a pattern P of length m and every length m substring of the text T.
155
•Dissertation
Nearest neighbor search : the old, the new, and the impossible
Alexandr Andoni
- 01 Jan 2009
TL;DR: This thesis gives a new algorithm for the approximate NN problem in the d-dimensional Euclidean space, and gives an evidence that the classical approaches to NN under certain hard distances, such as the string edit distance, are likely to fail.
129
References
A general method applicable to the search for similarities in the amino acid sequence of two proteins
TL;DR: A computer adaptable method for finding similarities in the amino acid sequences of two proteins has been developed and it is possible to determine whether significant homology exists between the proteins to trace their possible evolutionary development.
13.2K
Identification of common molecular subsequences.
TL;DR: This letter extends the heuristic homology algorithm of Needleman & Wunsch (1970) to find a pair of segments, one from each of two long sequences, such that there is no other Pair of segments with greater similarity (homology).
11.3K
The String-to-String Correction Problem
TL;DR: An algorithm is presented which solves the string-to-string correction problem in time proportional to the product of the lengths of the two strings.
3.5K
Linear pattern matching algorithms
Peter Weiner
- 15 Oct 1973
TL;DR: A linear time algorithm for obtaining a compacted version of a bi-tree associated with a given string is presented and indicated how to solve several pattern matching problems, including some from [4] in linear time.
2.1K
A Space-Economical Suffix Tree Construction Algorithm
TL;DR: A new algorithm is presented for constructing auxiliary digital search trees to aid in exact-match substring searching that has the same asymptotic running time bound as previously published algorithms, but is more economical in space.
1.7K