Longest repeated substring problem

Topic Tools

Papers published on a yearly basis

Papers

Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications

[...]

Toru Kasai¹, Gunho Lee², Hiroki Arimura¹, Hiroki Arimura³, Setsuo Arikawa¹, Kunsoo Park² - Show less +2 more•Institutions (3)

Kyushu University¹, Seoul National University², National Presto Industries³

1 Jul 2001

TL;DR: It is shown that the algorithm is crucial to the effective use of block-sorting compression and a linear-time algorithm to simulate the bottom-up traversal of a suffix tree with a suffix array combined with the longest common prefix information is presented.

...read moreread less

Abstract: We present a linear-time algorithm to compute the longest common prefix information in suffix arrays As two applications of our algorithm, we show that our algorithm is crucial to the effective use of block-sorting compression, and we present a linear-time algorithm to simulate the bottom-up traversal of a suffix tree with a suffix array combined with the longest common prefix information

...read moreread less

583 citations

Journal Article•10.1016/S0890-5401(03)00057-9•

Distinguishing string selection problems

[...]

J. Kevin Lanctot¹, Ming Li¹, Bin Ma², Shaojiu Wang, Louxin Zhang³ - Show less +1 more•Institutions (3)

University of Waterloo¹, University of Western Ontario², National University of Singapore³

25 Aug 2003-Information & Computation

TL;DR: This paper presents a collection of string algorithms that are at the core of several biological problems such as discovering potential drug targets, creating diagnostic probes, universal primers or unbiased consensus sequences, and proves that the Closest Substring Problem is NP-Hard.

...read moreread less

Abstract: This paper presents a collection of string algorithms that are at the core of several biological problems such as discovering potential drug targets, creating diagnostic probes, universal primers or unbiased consensus sequences. All these problems reduce to the task of finding a pattern that, with some error, occurs in one set of strings (Closest Substring Problem) and does not occur in another set (Farthest String Problem). In this paper, we break down the problem into several subproblems and prove the following results. 1. The following are all NP-Hard: the Farthest String Problem, the Closest Substring Problem, and the Closest String Problem of finding a string that is close to each string in a set. 2. There is a PTAS for the Farthest String Problem based on a linear programming relaxation technique. 3. There is a polynomial-time (4/3 + e)-approximation algorithm for the Closest String Problem for any small constant e > 0. Using this algorithm, we also provide an efficient heuristic algorithm for the Closest Substring Problem. 4. The problem of finding a string that is at least Hamming distance d from as many strings in a set as possible, cannot be approximated within ne in polynomial time for some fixed constant e unless NP = P, where n is the number of strings in the set. 5. There is a polynomial-time 2-approximation for finding a string that is both the Closest Substring to one set, and the Farthest String from another set.

...read moreread less

299 citations

Patent•

Method, system, and program for determining boundaries in a string using a dictionary

[...]

Richard Theodore Gillam¹•Institutions (1)

IBM¹

1 Sep 1999

TL;DR: Disclosed as mentioned in this paper is a system, method, and program for determining boundaries in a string of characters using a dictionary, wherein the substrings in the dictionary may comprise words and the boundaries follow each of the initial substrings and the at least one substring that includes all the characters following the initial substring.

...read moreread less

Abstract: Disclosed is a system, method, and program for determining boundaries in a string of characters using a dictionary, wherein the substrings in the dictionary may comprise words. A determination is made of all possible initial substrings of the string in the dictionary. One initial substring is selected such that all the characters following the initial substring can be divided into at least one substring in the dictionary. The boundaries follow each of the initial substring and the at least one substring that includes all the characters following the initial substring.

...read moreread less

131 citations

Journal Article•10.1016/J.JDA.2005.01.010•

Edit distance with move operations

[...]

Dana Shapira¹, James A. Storer¹•Institutions (1)

Brandeis University¹

01 Jun 2007-Journal of Discrete Algorithms

TL;DR: A polynomial time greedy algorithm for non-recursive moves which on a subclass of instances of a problem of size n achieves an approximation factor to optimal of at most O(logn).

...read moreread less

108 citations

Journal Article•

Edit distance with move operations

[...]

Dana Shapira¹, James A. Storer¹•Institutions (1)

Brandeis University¹

01 Jan 2002-Lecture Notes in Computer Science

TL;DR: This work considers the more general problem of strings being represented by a singly linked list and being able to apply these operations to the pointer associated with a vertex as well as the character associated with the vertex, and shows that this problem is NP-complete.

...read moreread less

Abstract: The traditional edit-distance problem is to find the minimum number of insert-character and delete-character (and sometimes change character) operations required to transform one string into another. Here we consider the more general problem of strings being represented by a singly linked list (one character per node) and being able to apply these operations to the pointer associated with a vertex as well as the character associated with the vertex. That is, in O(1) time, not only can characters be inserted or deleted, but also substrings can be moved or deleted. We limit our attention to the ability to move substrings and leave substring deletions for future research. Note that O(1) time substring move operations imply O(1) substring exchange operations as well, a form of transformation that has been of interest in molecular biology. We show that this problem is NP-complete, show that a recursive sequence of moves can be simulated with at most a constant factor increase by a non-recursive sequence, and present a polynomial time greedy algorithm for non-recursive moves with a worst-case log factor approximation to optimal. The development of this greedy algorithm shows how to reduce moves of substrings to moves of characters, and how to convert moves with characters to only insert and deletes of characters.

...read moreread less

106 citations

...

Expand

Year	Papers
2018	2
2017	6
2016	7
2015	10
2014	7
2013	10

Topic Tools

Papers published on a yearly basis

Papers

Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications

Distinguishing string selection problems

Method, system, and program for determining boundaries in a string using a dictionary

Edit distance with move operations

Edit distance with move operations

Related Topics (5)

Performance Metrics