Journal Article10.1145/1217856.1217858
Linear work suffix array construction
480
TL;DR: A generalized algorithm, DC, that allows a space-efficient implementation and, moreover, supports the choice of a space--time tradeoff and is asymptotically faster than all previous suffix tree or array construction algorithms.
read more
Abstract: Suffix trees and suffix arrays are widely used and largely interchangeable index structures on strings and sequences. Practitioners prefer suffix arrays due to their simplicity and space efficiency while theoreticians use suffix trees due to linear-time construction algorithms and more explicit structure. We narrow this gap between theory and practice with a simple linear-time construction algorithm for suffix arrays. The simplicity is demonstrated with a C++ implementation of 50 effective lines of code. The algorithm is called DC3, which stems from the central underlying concept of difference cover. This view leads to a generalized algorithm, DC, that allows a space-efficient implementation and, moreover, supports the choice of a space--time tradeoff. For any v ∈ [1,√n], it runs in O(vn) time using O(n/√v) space in addition to the input string and the suffix array. We also present variants of the algorithm for several parallel and hierarchical memory models of computation. The algorithms for BSP and EREW-PRAM models are asymptotically faster than all previous suffix tree or array construction algorithms.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Book
The Algorithm Design Manual
Steven Skiena
- 01 Jan 1980
TL;DR: This newly expanded and updated second edition of the best-selling classic continues to take the "mystery" out of designing algorithms, and analyzing their efficacy and efficiency.
1.3K
Ligra: a lightweight graph processing framework for shared memory
Julian Shun,Guy E. Blelloch +1 more
- 23 Feb 2013
TL;DR: This paper presents a lightweight graph processing framework that is specific for shared-memory parallel/multicore machines, which makes graph traversal algorithms easy to write and significantly more efficient than previously reported results using graph frameworks on machines with many more cores.
964
Journal of the ACM
Dan Suciu,Victor Vianu +1 more
TL;DR: The following three articles are full versions of extended abstracts that were presented at the Twenty-Third ACM SIGMOD-SigACT-SIGART Symposium on Principles of Database Systems (PODS) and have been reviewed according to the standard JACM refereeing process.
862
A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes
TL;DR: The Tallymer software, a flexible and memory-efficient collection of programs for k-mer counting and indexing of large sequence sets, is introduced, based on enhanced suffix arrays that gives a much larger flexibility concerning the choice of the k-mers size.
375
Two Efficient Algorithms for Linear Time Suffix Array Construction
Ge Nong,Sen Zhang,Wai Hong Chan +2 more
TL;DR: Two efficient algorithms for linear time suffix array construction, using the techniques of divide-and-conquer, and recursion, that yield the best time and space efficiencies among all the existing linear time SACAs.
191
References
A bridging model for parallel computation
TL;DR: The bulk-synchronous parallel (BSP) model is introduced as a candidate for this role, and results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.
4.1K
Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology
Susan Holmes,Dan Gusfield +1 more
TL;DR: The author examines the importance of (sub)sequence comparison in molecular biology, core string edits, alignments and dynamic programming, and a deeper look at classical methods for exact string matching.
3.1K
A Block-sorting Lossless Data Compression Algorithm
Michael Burrows,David Wheeler +1 more
- 01 Jan 1994
TL;DR: A block-sorting, lossless data compression algorithm, and the implementation of that algorithm and the performance of the implementation with widely available data compressors running on the same hardware are compared.