Open Access
Efficient Variable-to-Fixe d Length Coding Algorithms for Text Compression
Satoshi Yoshida
- 01 Jan 2014
TL;DR: This thesis focuses on lossless compression for text data, that is, text compression, and Variable-to-Fixed-length coding, a coding scheme that segments an input text into a consecutive sequence of substrings and then assigns a fixed length codeword to each substring.
read more
Abstract: Data compression is a technique for reducing the storage space and the cost of transferring a large amount of data, using redundancy hidden in the data. We focus on lossless compression for text data, that is, text compression, in this thesis. To reuse a huge amount of data stored in secondary storage, I/O speeds are bottlenecks. Such a communication-speed problem can be relieved if we transfer only compressed data through the communication channel and furthermore can perform every necessary processes, such as string search, on the compressed data itself without decompression. Therefore, a new criterion “ease of processing the compressed data” is required in the field of data compression. Development of compression algorithms is currently in the mainstream of data compression field but many of them are not adequate for that criterion. The algorithms employing variable length codewords succeeded to achieve an extremely good compression ratio, but the boundaries between codewords are not obvious without a special processing. Such an “unclear boundary problem” prevents us from direct accessing to the compressed data. On the contrary, Variable-to-Fixed-length coding , which is referred to as VF coding, is promising for our demand. VF coding is a coding scheme that segments an input text into a consecutive sequence of substrings (called phrases) and then assigns a fixed length codeword to each substring. Boundaries between codewords of VF coding are obvious because all of them have the same length. Therefore, we can realize “accessible data compression” by VF coding. Nevertheless, VF coding was not paid much attention
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Posted Content
Random Access to Grammar Compressed Strings
TL;DR: Two representations of a string of length n compressed into a context-free grammar of size n achieving random access time and several new techniques and data structures of independent interest are introduced, including a predecessor data structure, two "biased" weighted ancestor data structures, and a compact representation of heavy- paths in grammars.
Shift-And Approach to Pattern Matching in LZW Compressed Text
Takuya Kida,拓也 喜田,Masayuki Takeda,正幸 竹田,Ayumi Shinohara,歩 篠原,Setsuo Arikawa,節夫 有川 +7 more
- 01 Jan 1999
TL;DR: In this article, the Shift-And algorithm was used to solve the problem of pattern matching in LZW compressed text, where a pattern length is at most 32 or the word length.
58
Context-sensitive grammar transform: Compression and pattern matching
TL;DR: In this article, a greedy compression algorithm with the transform model is presented as well as a Knuth-Morris-Pratt (KMP)-type compressed pattern matching (CPM) algorithm.
8
•Journal Article
Fast $q$-gram Mining on SLP Compressed Strings
Keisuke Goto,Hideo Bannai,Shunsuke Inenaga,Takeda Masayuki,啓介 後藤,英夫 坂内,俊介 稲永,正幸 竹田,ケイスケ ゴトウ,ヒデオ バンナイ,シュンスケ イネナガ,マサユキ タケダ +11 more
TL;DR: An O(qn) time and space algorithm that computes the occurrence frequencies of all q-grams in T, namely, as a straight line program (SLP), which is practical for small q.
References
A universal algorithm for sequential data compression
Jacob Ziv,A. Lempel +1 more
TL;DR: The compression ratio achieved by the proposed universal code uniformly approaches the lower bounds on the compression ratios attainable by block-to-variable codes and variable- to-block codes designed to match a completely specified source.
Compression of individual sequences via variable-rate coding
Jacob Ziv,A. Lempel +1 more
TL;DR: The proposed concept of compressibility is shown to play a role analogous to that of entropy in classical information theory where one deals with probabilistic ensembles of sequences rather than with individual sequences.
4K
A Block-sorting Lossless Data Compression Algorithm
Michael Burrows,David Wheeler +1 more
- 01 Jan 1994
TL;DR: A block-sorting, lossless data compression algorithm, and the implementation of that algorithm and the performance of the implementation with widely available data compressors running on the same hardware are compared.
•Book
Introduction to data compression
Khalid Sayood
- 01 Jan 1996
TL;DR: The author explains the development of the Huffman Coding Algorithm and some of the techniques used in its implementation, as well as some of its applications, including Image Compression, which is based on the JBIG standard.
2.6K
Linear pattern matching algorithms
Peter Weiner
- 15 Oct 1973
TL;DR: A linear time algorithm for obtaining a compacted version of a bi-tree associated with a given string is presented and indicated how to solve several pattern matching problems, including some from [4] in linear time.
2.1K
Related Papers (5)
Mamta Sharma
- 01 Jan 2010
Harika Devi Kotha,Madhumitha Tummanapally,Vikash Kumar Upadhyay +2 more
- 01 May 2019