Suffix tree

Topic Tools

Papers published on a yearly basis

Papers

Monograph•10.1017/CBO9780511574931•

Algorithms on Strings, Trees, and Sequences: Suffix Trees and Their Uses

[...]

Dan Gusfield

1 Jan 1997

2,830 citations

Journal Article•10.1137/0222058•

Suffix arrays: a new method for on-line string searches

[...]

Udi Manber¹, Gene Myers¹•Institutions (1)

University of Arizona¹

01 Oct 1993-SIAM Journal on Computing

TL;DR: A new and conceptually simple data structure, called a suffixarray, for on-line string searches is introduced in this paper, and it is believed that suffixarrays will prove to be better in practice than suffixtrees for many applications.

...read moreread less

Abstract: A new and conceptually simple data structure, called a suffix array, for on-line string searches is introduced in this paper. Constructing and querying suffix arrays is reduced to a sort and search paradigm that employs novel algorithms. The main advantage of suffix arrays over suffix trees is that, in practice, they use three to five times less space. From a complexity standpoint, suffix arrays permit on-line string searches of the type, “Is W a substring of A?” to be answered in time $O(P + \log N)$, where P is the length of W and N is the length of A, which is competitive with (and in some cases slightly better than) suffix trees. The only drawback is that in those instances where the underlying alphabet is finite and small, suffix trees can be constructed in $O(N)$ time in the worst case, versus $O(N\log N)$ time for suffix arrays. However, an augmented algorithm is given that, regardless of the alphabet size, constructs suffix arrays in $O(N)$expected time, albeit with lesser space efficiency. It is ...

...read moreread less

2,421 citations

Journal Article•10.1371/JOURNAL.PCBI.1005944•

MUMmer4: A fast and versatile genome alignment system.

[...]

Guillaume Marçais¹, Arthur L. Delcher², Adam M. Phillippy³, Rachel Coston², Steven L. Salzberg², Aleksey V. Zimin², Aleksey V. Zimin¹ - Show less +3 more•Institutions (3)

University of Maryland, College Park¹, Johns Hopkins University², National Institutes of Health³

26 Jan 2018-PLOS Computational Biology

TL;DR: MUMmer4 is described, a substantially improved version of MUMmer that addresses genome size constraints by changing the 32-bit suffix tree data structure at the core of Mummer to a 48- bit suffix array, and that offers improved speed through parallel processing of input query sequences.

...read moreread less

Abstract: The MUMmer system and the genome sequence aligner nucmer included within it are among the most widely used alignment packages in genomics. Since the last major release of MUMmer version 3 in 2004, it has been applied to many types of problems including aligning whole genome sequences, aligning reads to a reference genome, and comparing different assemblies of the same genome. Despite its broad utility, MUMmer3 has limitations that can make it difficult to use for large genomes and for the very large sequence data sets that are common today. In this paper we describe MUMmer4, a substantially improved version of MUMmer that addresses genome size constraints by changing the 32-bit suffix tree data structure at the core of MUMmer to a 48-bit suffix array, and that offers improved speed through parallel processing of input query sequences. With a theoretical limit on the input size of 141Tbp, MUMmer4 can now work with input sequences of any biologically realistic length. We show that as a result of these enhancements, the nucmer program in MUMmer4 is easily able to handle alignments of large genomes; we illustrate this with an alignment of the human and chimpanzee genomes, which allows us to compute that the two species are 98% identical across 96% of their length. With the enhancements described here, MUMmer4 can also be used to efficiently align reads to reference genomes, although it is less sensitive and accurate than the dedicated read aligners. The nucmer aligner in MUMmer4 can now be called from scripting languages such as Perl, Python and Ruby. These improvements make MUMer4 one the most versatile genome alignment packages available.

...read moreread less

2,033 citations

Journal Article•10.1145/321941.321946•

A Space-Economical Suffix Tree Construction Algorithm

[...]

Edward M. McCreight¹•Institutions (1)

PARC¹

01 Apr 1976-Journal of the ACM

TL;DR: A new algorithm is presented for constructing auxiliary digital search trees to aid in exact-match substring searching that has the same asymptotic running time bound as previously published algorithms, but is more economical in space.

...read moreread less

Abstract: A new algorithm is presented for constructing auxiliary digital search trees to aid in exact-match substring searching. This algorithm has the same asymptotic running time bound as previously published algorithms, but is more economical in space. Some implementation considerations are discussed, and new work on the modification of these search trees in response to incremental changes in the strings they index (the update problem) is presented.

...read moreread less

1,778 citations

Journal Article•10.1007/BF01206331•

On-line construction of suffix trees

[...]

Esko Ukkonen¹•Institutions (1)

University of Helsinki¹

01 Sep 1995-Algorithmica

TL;DR: An on-line algorithm is presented for constructing the suffix tree for a given string in time linear in the length of the string, developed as a linear-time version of a very simple algorithm for (quadratic size) suffixtries.

...read moreread less

Abstract: An on-line algorithm is presented for constructing the suffix tree for a given string in time linear in the length of the string. The new algorithm has the desirable property of processing the string symbol by symbol from left to right. It always has the suffix tree for the scanned part of the string ready. The method is developed as a linear-time version of a very simple algorithm for (quadratic size) suffixtries. Regardless of its quadratic worst case this latter algorithm can be a good practical method when the string is not too long. Another variation of this method is shown to give, in a natural way, the well-known algorithms for constructing suffix automata (DAWGs).

...read moreread less

1,696 citations

...

Expand

Year	Papers
2025	4
2024	9
2023	19
2022	29
2021	20
2020	29

Topic Tools

Papers published on a yearly basis

Papers

Algorithms on Strings, Trees, and Sequences: Suffix Trees and Their Uses

Suffix arrays: a new method for on-line string searches

MUMmer4: A fast and versatile genome alignment system.

A Space-Economical Suffix Tree Construction Algorithm

On-line construction of suffix trees

Related Topics (5)

Performance Metrics