Generalized Substring Compression

doi:10.1007/978-3-642-02441-2_3

Book Chapter10.1007/978-3-642-02441-2_3

Generalized Substring Compression

Orgad Keller, +3 more

- 18 Jun 2009

- pp 26-38

23

TL;DR: This work focuses its attention on generalized substring compression and presents the first non-trivial correct algorithm for this problem and inherently proposes a method for finding the bounded longest common prefix of substrings, which may be of independent interest.

Abstract: In substring compression one is given a text to preprocess so that, upon request, a compressed substring is returned. Generalized substring compression is the same with the following twist. The queries contain an additional context substring (or a collection of context substrings) and the answers are the substring in compressed format, where the context substring is used to make the compression more efficient. We focus our attention on generalized substring compression and present the first non-trivial correct algorithm for this problem. In our algorithm we inherently propose a method for finding the bounded longest common prefix of substrings, which may be of independent interest. In addition, we propose an efficient algorithm for substring compression which makes use of range searching for minimum queries. We present several tradeoffs for both problems. For compressing the substring S [i . . j ] (possibly with the substring S [*** . . β ] as a context), best query times we achieve are O (C ) and $O\big(C\log\big(\frac{j-i}{C}\big)\big)$ for substring compression query and generalized substring compression query, respectively, where C is the number of phrases encoded.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1016/J.TCS.2013.11.018

Extracting powers and periods in a word from its runs structure

Maxime Crochemore, +5 more

- 01 Feb 2014

- Theoretical Computer Science

TL;DR: Lyndon words are used and the Lyndon structure of runs are introduced as a useful tool when computing powers and in problems related to periods some versions of the Manhattan skyline problem are used.

...read moreread less

84

•Journal Article•10.1016/J.TCS.2013.10.010

Generalized substring compression

Orgad Keller, +3 more

- 01 Mar 2014

- Theoretical Computer Science

TL;DR: An efficient algorithm for substring compression which makes use of range successor queries and a new method for finding the bounded longest common prefix of substrings, which may be of independent interest are proposed.

...read moreread less

29

•Posted Content

Minimal Suffix and Rotation of a Substring in Optimal Time

Tomasz Kociumaka

- 29 Jan 2016

- arXiv: Data Structures and Algorithms

TL;DR: In this paper, the substring minimal suffix queries are used to determine the lexicographically minimal non-empty suffix of a substring specified by the location of its occurrence in the text.

...read moreread less

19

•Journal Article•10.1007/S00453-021-00821-Y

Internal Dictionary Matching

Panagiotis Charalampopoulos, +6 more

- 17 Apr 2021

- Algorithmica

TL;DR: Data structures answering queries concerning the occurrences of patterns from a given dictionary in fragments of a given string T of length n are introduced and tight—up to subpolynomial factors—upper and lower bounds for the case of a dynamic dictionary are provided.

...read moreread less

14

Book Chapter•10.1007/978-3-319-02432-5_29

Faster Range LCP Queries

Manish Patil, +2 more

- 07 Oct 2013

TL;DR: This paper describes a linear space data structure with O(( j - i)1/2log e (j - i)) query time, where e > 0 is any constant and improves the linear space and O((j -i)loglogn) query time solution by Amir et.

...read moreread less

14

...

Expand

References

Journal Article•10.1109/TIT.1977.1055714

A universal algorithm for sequential data compression

Jacob Ziv, +1 more

- 01 May 1977

- IEEE Transactions on Information Theory

TL;DR: The compression ratio achieved by the proposed universal code uniformly approaches the lower bounds on the compression ratios attainable by block-to-variable codes and variable- to-block codes designed to match a completely specified source.

...read moreread less

6.3K

Proceedings Article•10.1109/SWAT.1973.13

Linear pattern matching algorithms

Peter Weiner

- 15 Oct 1973

TL;DR: A linear time algorithm for obtaining a compacted version of a bi-tree associated with a given string is presented and indicated how to solve several pattern matching problems, including some from [4] in linear time.

...read moreread less

2.1K

Journal Article•10.1145/321941.321946

A Space-Economical Suffix Tree Construction Algorithm

Edward M. McCreight

- 01 Apr 1976

- Journal of the ACM

TL;DR: A new algorithm is presented for constructing auxiliary digital search trees to aid in exact-match substring searching that has the same asymptotic running time bound as previously published algorithms, but is more economical in space.

...read moreread less

1.7K

Journal Article•10.1007/BF01206331

On-line construction of suffix trees

Esko Ukkonen

- 01 Sep 1995

- Algorithmica

TL;DR: An on-line algorithm is presented for constructing the suffix tree for a given string in time linear in the length of the string, developed as a linear-time version of a very simple algorithm for (quadratic size) suffixtries.

...read moreread less

1.6K

Journal Article•10.1137/0213024

Fast algorithms for finding nearest common ancestors

Dov Harel, +1 more

- 17 May 1984

- SIAM Journal on Computing

TL;DR: An algorithm for a random access machine with uniform cost measure (and a bound of $\Omega (\log n)$ on the number of bits per word) that requires time per query and preprocessing time is presented, assuming that the collection of trees is static.

...read moreread less

1.3K

...

Expand

Generalized Substring Compression

Chat with Paper

AI Agents for this Paper

Citations

Extracting powers and periods in a word from its runs structure

Generalized substring compression

Minimal Suffix and Rotation of a Substring in Optimal Time

Internal Dictionary Matching

Faster Range LCP Queries

References

A universal algorithm for sequential data compression

Linear pattern matching algorithms

A Space-Economical Suffix Tree Construction Algorithm

On-line construction of suffix trees

Fast algorithms for finding nearest common ancestors

Related Papers (5)

Generalized substring compression

Substring compression problems

On shortest unique substring queries

Finding Characteristic Substrings from Compressed Texts.

Substring range reporting