Open AccessPosted Content
String algorithms and data structures
TL;DR: This survey is aimed at illustrating the key ideas which should constitute, in this opinion, the current background of every index designer.
read more
Abstract: The string-matching field has grown at a such complicated stage that various issues come into play when studying it: data structure and algorithmic design, database principles, compression techniques, architectural features, cache and prefetching policies. The expertise nowadays required to design good string data structures and algorithms is therefore transversal to many computer science fields and much more study on the orchestration of known, or novel, techniques is needed to make progress in this fascinating topic. This survey is aimed at illustrating the key ideas which should constitute, in our opinion, the current background of every index designer. We also discuss the positive features and drawback of known indexing schemes and algorithms, and devote much attention to detail research issues and open problems both on the theoretical and the experimental side.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Information storage and retrieval
Susan Brewer
- 01 Sep 1959
TL;DR: The letter and/or sound combinations that make up a human language are limited by the human's ability to pronounce tnese sounds° Therefore, the standard library search, which as a rule looks for all possible combinations of letters to find a word, is wasteful.
351
Shift-And Approach to Pattern Matching in LZW Compressed Text
Takuya Kida,拓也 喜田,Masayuki Takeda,正幸 竹田,Ayumi Shinohara,歩 篠原,Setsuo Arikawa,節夫 有川 +7 more
- 01 Jan 1999
TL;DR: In this article, the Shift-And algorithm was used to solve the problem of pattern matching in LZW compressed text, where a pattern length is at most 32 or the word length.
58
Boosting Text Compression with Word-Based Statistical Encoding1
TL;DR: A new suffix-free Dense-Code-based compressor that compresses slightly better and some self-indexes can handle non-suffix-free codes is presented, which allows indexed searches for both words and phrases.
A randomized Numerical Aligner (rNA)
TL;DR: A generalization of the classical Rabin-Karp string matching algorithm to solve the k-mismatch problem, with average complexity O(n+m) (n text and m pattern lengths, respectively) and is in general faster and more accurate than other available tools like SOAP2, BWA, and BOWTIE.
14
A randomized numerical aligner (rNA)
Alberto Policriti,Alexandru I. Tomescu,Francesco Vezzi +2 more
- 24 May 2010
TL;DR: rNA (randomized Numerical Aligner)—outperforms available tools like SOAP2, BWA, and BOWTIE, processing up to 10 times more patterns per second on texts of (practically) significant lengths.
8
References
•Book
Modern Information Retrieval
Ricardo Baeza-Yates,Berthier Ribeiro-Neto +1 more
- 15 May 1999
TL;DR: In this article, the authors present a rigorous and complete textbook for a first course on information retrieval from the computer science (as opposed to a user-centred) perspective, which provides an up-to-date student oriented treatment of the subject.
•Book
Foundations of Statistical Natural Language Processing
Christopher D. Manning,Hinrich Schütze +1 more
- 28 May 1999
TL;DR: This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear and provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.
Space/time trade-offs in hash coding with allowable errors
TL;DR: Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.
A universal algorithm for sequential data compression
Jacob Ziv,A. Lempel +1 more
TL;DR: The compression ratio achieved by the proposed universal code uniformly approaches the lower bounds on the compression ratios attainable by block-to-variable codes and variable- to-block codes designed to match a completely specified source.