Longest common substring problem

Topic Tools

Papers published on a yearly basis

Papers

Journal Article•10.1145/321941.321946•

A Space-Economical Suffix Tree Construction Algorithm

[...]

Edward M. McCreight¹•Institutions (1)

PARC¹

01 Apr 1976-Journal of the ACM

TL;DR: A new algorithm is presented for constructing auxiliary digital search trees to aid in exact-match substring searching that has the same asymptotic running time bound as previously published algorithms, but is more economical in space.

...read moreread less

Abstract: A new algorithm is presented for constructing auxiliary digital search trees to aid in exact-match substring searching. This algorithm has the same asymptotic running time bound as previously published algorithms, but is more economical in space. Some implementation considerations are discussed, and new work on the modification of these search trees in response to incremental changes in the strings they index (the update problem) is presented.

...read moreread less

1,778 citations

Journal Article•10.1016/S1570-8667(03)00065-0•

Replacing suffix trees with enhanced suffix arrays

[...]

Mohamed Abouelhoda¹, Stefan Kurtz², Enno Ohlebusch¹•Institutions (2)

University of Ulm¹, University of Hamburg²

01 Mar 2004-Journal of Discrete Algorithms

TL;DR: This article shows how every algorithm that uses a suffix tree as data structure can systematically be replaced with an algorithm that use an enhanced suffix array and solves the same problem in the same time complexity.

...read moreread less

813 citations

Book Chapter•10.1007/3-540-48194-X_17•

Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications

[...]

Toru Kasai¹, Gunho Lee², Hiroki Arimura³, Hiroki Arimura¹, Setsuo Arikawa¹, Kunsoo Park² - Show less +2 more•Institutions (3)

Kyushu University¹, Seoul National University², National Presto Industries³

1 Jul 2001

TL;DR: It is shown that the algorithm is crucial to the effective use of block-sorting compression and a linear-time algorithm to simulate the bottom-up traversal of a suffix tree with a suffix array combined with the longest common prefix information is presented.

...read moreread less

Abstract: We present a linear-time algorithm to compute the longest common prefix information in suffix arrays As two applications of our algorithm, we show that our algorithm is crucial to the effective use of block-sorting compression, and we present a linear-time algorithm to simulate the bottom-up traversal of a suffix tree with a suffix array combined with the longest common prefix information

...read moreread less

583 citations

Proceedings Article•10.1109/SFCS.1997.646102•

Optimal suffix tree construction with large alphabets

[...]

Martin Farach¹•Institutions (1)

Rutgers University¹

19 Oct 1997

TL;DR: This work builds suffix trees in linear time for integer alphabet using Weiner's algorithm, which matches a trivial /spl Omega/(n log n)-time lower bound based on sorting.

...read moreread less

Abstract: The suffix tree of a string is the fundamental data structure of combinatorial pattern matching. Weiner (1973), who introduced the data structure, gave an O(n)-time algorithm for building the suffix tree of an n-character string drawn from a constant size alphabet. In the comparison model, there is a trivial /spl Omega/(n log n)-time lower bound based on sorting, and Weiner's algorithm matches this bound trivially. For integer alphabets, a substantial gap remains between the known upper and lower bounds, and closing this gap is the main open question in the construction of suffix trees. There is no super-linear lower bound, and the fastest known algorithm was the O(n log n) time comparison based algorithm. We settle this open problem by closing the gap: we build suffix trees in linear time for integer alphabet.

...read moreread less

480 citations

Journal Article•10.1016/S0890-5401(03)00057-9•

Distinguishing string selection problems

[...]

J. Kevin Lanctot¹, Ming Li¹, Bin Ma², Shaojiu Wang, Louxin Zhang³ - Show less +1 more•Institutions (3)

University of Waterloo¹, University of Western Ontario², National University of Singapore³

25 Aug 2003-Information & Computation

TL;DR: This paper presents a collection of string algorithms that are at the core of several biological problems such as discovering potential drug targets, creating diagnostic probes, universal primers or unbiased consensus sequences, and proves that the Closest Substring Problem is NP-Hard.

...read moreread less

Abstract: This paper presents a collection of string algorithms that are at the core of several biological problems such as discovering potential drug targets, creating diagnostic probes, universal primers or unbiased consensus sequences. All these problems reduce to the task of finding a pattern that, with some error, occurs in one set of strings (Closest Substring Problem) and does not occur in another set (Farthest String Problem). In this paper, we break down the problem into several subproblems and prove the following results. 1. The following are all NP-Hard: the Farthest String Problem, the Closest Substring Problem, and the Closest String Problem of finding a string that is close to each string in a set. 2. There is a PTAS for the Farthest String Problem based on a linear programming relaxation technique. 3. There is a polynomial-time (4/3 + e)-approximation algorithm for the Closest String Problem for any small constant e > 0. Using this algorithm, we also provide an efficient heuristic algorithm for the Closest Substring Problem. 4. The problem of finding a string that is at least Hamming distance d from as many strings in a set as possible, cannot be approximated within ne in polynomial time for some fixed constant e unless NP = P, where n is the number of strings in the set. 5. There is a polynomial-time 2-approximation for finding a string that is both the Closest Substring to one set, and the Farthest String from another set.

...read moreread less

299 citations

...

Expand

Year	Papers
2021	8
2020	18
2019	10
2018	13
2017	13
2016	24

Topic Tools

Papers published on a yearly basis

Papers

A Space-Economical Suffix Tree Construction Algorithm

Replacing suffix trees with enhanced suffix arrays

Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications

Optimal suffix tree construction with large alphabets

Distinguishing string selection problems

Related Topics (5)

Performance Metrics