String algorithms and data structures

Open AccessPosted Content

String algorithms and data structures

- 15 Jan 2008

11

TL;DR: This survey is aimed at illustrating the key ideas which should constitute, in this opinion, the current background of every index designer.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Proceedings Article•10.1145/612201.612221

Information storage and retrieval

Susan Brewer

- 01 Sep 1959

TL;DR: The letter and/or sound combinations that make up a human language are limited by the human's ability to pronounce tnese sounds° Therefore, the standard library search, which as a rule looks for all possible combinations of letters to find a word, is wasteful.

...read moreread less

351

Shift-And Approach to Pattern Matching in LZW Compressed Text

Takuya Kida, +7 more

- 01 Jan 1999

TL;DR: In this article, the Shift-And algorithm was used to solve the problem of pattern matching in LZW compressed text, where a pattern length is at most 32 or the word length.

...read moreread less

58

•Journal Article•10.1093/COMJNL/BXR096

Boosting Text Compression with Word-Based Statistical Encoding1

Antonio Fariña, +2 more

- 01 Jan 2012

- The Computer Journal

TL;DR: A new suffix-free Dense-Code-based compressor that compresses slightly better and some self-indexes can handle non-suffix-free codes is presented, which allows indexed searches for both words and phrases.

...read moreread less

19

•Journal Article•10.1016/J.JCSS.2011.12.007

A randomized Numerical Aligner (rNA)

Alberto Policriti, +2 more

- 01 Nov 2012

- Journal of Computer and System Sciences

TL;DR: A generalization of the classical Rabin-Karp string matching algorithm to solve the k-mismatch problem, with average complexity O(n+m) (n text and m pattern lengths, respectively) and is in general faster and more accurate than other available tools like SOAP2, BWA, and BOWTIE.

...read moreread less

14

Book Chapter•10.1007/978-3-642-13089-2_43

A randomized numerical aligner (rNA)

Alberto Policriti, +2 more

- 24 May 2010

TL;DR: rNA (randomized Numerical Aligner)—outperforms available tools like SOAP2, BWA, and BOWTIE, processing up to 10 times more patterns per second on texts of (practically) significant lengths.

...read moreread less

8

References

•Book

Modern Information Retrieval

Ricardo Baeza-Yates, +1 more

- 15 May 1999

TL;DR: In this article, the authors present a rigorous and complete textbook for a first course on information retrieval from the computer science (as opposed to a user-centred) perspective, which provides an up-to-date student oriented treatment of the subject.

...read moreread less

11.6K

•Book

Foundations of Statistical Natural Language Processing

Christopher D. Manning, +1 more

- 28 May 1999

TL;DR: This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear and provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.

...read moreread less

10.9K

Journal Article•10.1145/362686.362692

Space/time trade-offs in hash coding with allowable errors

Burton H. Bloom

- 01 Jul 1970

- Communications of The ACM

TL;DR: Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.

...read moreread less

8.3K

•Book

Human behavior and the principle of least effort

George Kingsley Zipf

- 01 Jan 1949

7.7K

Journal Article•10.1109/TIT.1977.1055714

A universal algorithm for sequential data compression

Jacob Ziv, +1 more

- 01 May 1977

- IEEE Transactions on Information Theory

TL;DR: The compression ratio achieved by the proposed universal code uniformly approaches the lower bounds on the compression ratios attainable by block-to-variable codes and variable- to-block codes designed to match a completely specified source.

...read moreread less

6.3K

...

Expand

String algorithms and data structures

Chat with Paper

AI Agents for this Paper

Citations

Information storage and retrieval

Shift-And Approach to Pattern Matching in LZW Compressed Text

Boosting Text Compression with Word-Based Statistical Encoding1

A randomized Numerical Aligner (rNA)

A randomized numerical aligner (rNA)

References

Modern Information Retrieval

Foundations of Statistical Natural Language Processing

Space/time trade-offs in hash coding with allowable errors

Human behavior and the principle of least effort

A universal algorithm for sequential data compression

Related Papers (5)

A guided tour to approximate string matching

Suffix arrays: a new method for on-line string searches

The Power of String Solving: Simplicity of Comparison

Combinatorial Algorithms in Scientific Computing

An analysis framework addressing the scale and legibility of large scientific data sets