Faster Compact Top-k Document Retrieval
Roberto Konow,Gonzalo Navarro +1 more
- 20 Mar 2013
- pp 351-360
TL;DR: In this article, the authors proposed to replace suffix tree sampling by frequency thresholding to achieve faster top-k document retrieval with O(m + (k+log log n) log log n).
read more
Abstract: An optimal index solving top-k document retrieval [Navarro and Nekrich, SODA'12] takes O(m+k) time for a pattern of length m, but its space is at least 80n bytes for a collection of n symbols. We reduce it to 1.5n-3n bytes, with O(m + (k+log log n)log log n) time, on typical texts. The index is up to 25 times faster than the best previous compressed solutions, and requires at most 5% more space in practice (and in some cases as little as one half). Apart from replacing classical by compressed data structures, our main idea is to replace suffix tree sampling by frequency thresholding to achieve compression.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
From Theory to Practice: Plug and Play with Succinct Data Structures
Simon Gog,Timo Beller,Alistair Moffat,Matthias Petri +3 more
- 29 Jun 2014
TL;DR: This paper presents a framework for experimentation with succinct data structures, providing a large set of configurable components, together with tests, benchmarks, and tools to analyze resource requirements.
398
Space-Efficient Frameworks for Top-k String Retrieval
TL;DR: This work presents the first linear-space framework that is capable of handling arbitrary score functions with near-optimal O(p + klog k) query time and derives compact space and succinct space indexes (for some specific score functions).
45
Improved Range Minimum Queries
Héctor Ferrada,Gonzalo Navarro +1 more
TL;DR: It is shown that, by using instead the BP representation, the formula becomes simpler since border conditions are eliminated and this leads to the fastest and most compact practical implementation to date.
23
New space/time tradeoffs for top-k document retrieval on sequences
TL;DR: How far the space/time tradeoff for this problem can be pushed for top-k queries is explored, and three results are obtained.
21
Indexes for Document Retrieval with Relevance
Wing-Kai Hon,Manish Patil,Rahul Shah,Sharma V. Thankachan,Jeffrey Scott Vitter +4 more
- 01 Jan 2013
TL;DR: Document retrieval is a special type of pattern matching that is closely related to information retrieval and web searching, and if the query consists of an arbitrary string, it cannot take advantages of the word boundaries and the authors need a different approach.
References
Suffix arrays: a new method for on-line string searches
Udi Manber,Gene Myers +1 more
TL;DR: A new and conceptually simple data structure, called a suffixarray, for on-line string searches is introduced in this paper, and it is believed that suffixarrays will prove to be better in practice than suffixtrees for many applications.
2.4K
Linear pattern matching algorithms
Peter Weiner
- 15 Oct 1973
TL;DR: A linear time algorithm for obtaining a compacted version of a bi-tree associated with a given string is presented and indicated how to solve several pattern matching problems, including some from [4] in linear time.
2.1K
High-order entropy-compressed text indexes
Roberto Grossi,Ankur Gupta,Jeffrey Scott Vitter +2 more
- 12 Jan 2003
TL;DR: A novel implementation of compressed suffix arrays exhibiting new tradeoffs between search time and space occupancy for a given text (or sequence) of n symbols over an alphabet σ, where each symbol is encoded by lg|σ| bits.
•Book
Information Retrieval: Implementing and Evaluating Search Engines
Stefan Büttcher,Charles L. A. Clarke,Gordon V. Cormack +2 more
- 23 Jul 2010
TL;DR: Information Retrieval offers an introduction to the core topics underlying modern search technologies, including algorithms, data structures, indexing, retrieval, and evaluation, and is a valuable reference for professionals in computer science, computer engineering, and software engineering.
561
Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays
Johannes Fischer,Volker Heun +1 more
TL;DR: This work builds a data structure that allows us to answer efficiently subsequent on-line queries of the form “what is the position of a minimum element in the subarray ranging from $i to $j$?”
312
Related Papers (5)
S. Muthukrishnan
- 06 Jan 2002
Peter Weiner
- 15 Oct 1973
Roberto Grossi,Ankur Gupta,Jeffrey Scott Vitter +2 more
- 12 Jan 2003