Improving PPM Algorithm Using Dictionaries
Yichuan Hu,Jianzhong Zhang,Farooq Khan,Ying Li +3 more
- 29 Mar 2011
- pp 459-459
TL;DR: In this paper, a character-based PPM text compression algorithm for natural languages is proposed, in which nonwords and prefixes of words are encoded using characterbased context models and suffix of words using dictionary models.
read more
Abstract: We propose a method to improve traditional character-based PPM text compression algorithm [1] for natural languages. Consider a text file as a sequence of alternating words and non-words, the basic idea of our algorithm is to encode nonwords and prefixes of words using character-based context models and encode suffixes of words using dictionary models. By using dictionary models, the algorithm can encode multiple characters as a whole, and thus enhance the compression efficiency. The advantages of the proposed algorithm are: 1) it does not require any text preprocessing, 2) it does not need any explicit codeword to identify switch between context and dictionary models, 3) it can be applied to any character-based PPM algorithms without incurring much additional computational cost.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Improving PPM Algorithm Using Dictionaries
Yichuan Hu,Jianzhong Zhang,Farooq Khan,Ying Li +3 more
- 29 Mar 2011
TL;DR: In this paper, a character-based PPM text compression algorithm for natural languages is proposed, in which nonwords and prefixes of words are encoded using characterbased context models and suffix of words using dictionary models.
•Posted Content
Improving PPM Algorithm Using Dictionaries
TL;DR: This work proposes a method to improve traditional character-based PPM text compression algorithm for natural languages by using dictionary models, which can encode multiple characters as a whole, and thus enhance the compression efficiency.
3
Rapid lossless compression of short text messages
TL;DR: b64pack is an efficient method for compression of short text messages based on standards which facilitate easy deployment and interoperability and is faster than compress, gzip and bzip2 by orders of magnitudes.
References
Data Compression Using Adaptive Coding and Partial String Matching
John G. Cleary,Ian H. Witten +1 more
TL;DR: This paper describes how the conflict can be resolved with partial string matching, and reports experimental results which show that mixed-case English text can be coded in as little as 2.2 bits/ character with no prior knowledge of the source.
1.4K
Implementing the PPM data compression scheme
TL;DR: It is shown that the estimates made by Cleary and Witten of the resources required to implement the PPM scheme can be revised to allow for a tractable and useful implementation.
Modeling for text compression
TL;DR: This paper surveys successful strategies for adaptive modeling that are suitable for use in practical text compression systems, and falls into three main classes: finite-context modeling, in which the last few characters are used to condition the probability distribution for the next one.
343
PPM: one step to practicality
D. Shkarin
- 02 Apr 2002
TL;DR: The PPM algorithm implementation that has a complexity comparable with widespread practical compression schemes based on LZ77, LZ78 and BWT algorithms is devoted.
202
The entropy of English using PPM-based models
William J. Teahan,John G. Cleary +1 more
- 31 Mar 1996
TL;DR: The importance of training text for PPM is demonstrated, showing that its performance can be improved by "adjusting" the alphabet used, and the results based on these improvements are given.
Related Papers (5)
Joaquín Adiego,Pablo de la Fuente +1 more
- 11 Oct 2006
S. Cenk Sahinalp,Nasir M. Rajpoot +1 more
- 01 Jan 2003
Ida Mengyi Pu
- 01 Jan 2005
N.J. Larsson,Alistair Moffat +1 more
- 01 Nov 2000