About: Longest common substring problem is a research topic. Over the lifetime, 503 publications have been published within this topic receiving 20548 citations.
TL;DR: A new algorithm is presented for constructing auxiliary digital search trees to aid in exact-match substring searching that has the same asymptotic running time bound as previously published algorithms, but is more economical in space.
Abstract: A new algorithm is presented for constructing auxiliary digital search trees to aid in exact-match substring searching. This algorithm has the same asymptotic running time bound as previously published algorithms, but is more economical in space. Some implementation considerations are discussed, and new work on the modification of these search trees in response to incremental changes in the strings they index (the update problem) is presented.
TL;DR: This article shows how every algorithm that uses a suffix tree as data structure can systematically be replaced with an algorithm that use an enhanced suffix array and solves the same problem in the same time complexity.
TL;DR: It is shown that the algorithm is crucial to the effective use of block-sorting compression and a linear-time algorithm to simulate the bottom-up traversal of a suffix tree with a suffix array combined with the longest common prefix information is presented.
Abstract: We present a linear-time algorithm to compute the longest common prefix information in suffix arrays As two applications of our algorithm, we show that our algorithm is crucial to the effective use of block-sorting compression, and we present a linear-time algorithm to simulate the bottom-up traversal of a suffix tree with a suffix array combined with the longest common prefix information
TL;DR: This work builds suffix trees in linear time for integer alphabet using Weiner's algorithm, which matches a trivial /spl Omega/(n log n)-time lower bound based on sorting.
Abstract: The suffix tree of a string is the fundamental data structure of combinatorial pattern matching. Weiner (1973), who introduced the data structure, gave an O(n)-time algorithm for building the suffix tree of an n-character string drawn from a constant size alphabet. In the comparison model, there is a trivial /spl Omega/(n log n)-time lower bound based on sorting, and Weiner's algorithm matches this bound trivially. For integer alphabets, a substantial gap remains between the known upper and lower bounds, and closing this gap is the main open question in the construction of suffix trees. There is no super-linear lower bound, and the fastest known algorithm was the O(n log n) time comparison based algorithm. We settle this open problem by closing the gap: we build suffix trees in linear time for integer alphabet.
TL;DR: This paper presents a collection of string algorithms that are at the core of several biological problems such as discovering potential drug targets, creating diagnostic probes, universal primers or unbiased consensus sequences, and proves that the Closest Substring Problem is NP-Hard.
Abstract: This paper presents a collection of string algorithms that are at the core of several biological problems such as discovering potential drug targets, creating diagnostic probes, universal primers or unbiased consensus sequences. All these problems reduce to the task of finding a pattern that, with some error, occurs in one set of strings (Closest Substring Problem) and does not occur in another set (Farthest String Problem). In this paper, we break down the problem into several subproblems and prove the following results. 1. The following are all NP-Hard: the Farthest String Problem, the Closest Substring Problem, and the Closest String Problem of finding a string that is close to each string in a set. 2. There is a PTAS for the Farthest String Problem based on a linear programming relaxation technique. 3. There is a polynomial-time (4/3 + e)-approximation algorithm for the Closest String Problem for any small constant e > 0. Using this algorithm, we also provide an efficient heuristic algorithm for the Closest Substring Problem. 4. The problem of finding a string that is at least Hamming distance d from as many strings in a set as possible, cannot be approximated within ne in polynomial time for some fixed constant e unless NP = P, where n is the number of strings in the set. 5. There is a polynomial-time 2-approximation for finding a string that is both the Closest Substring to one set, and the Farthest String from another set.