Universal text preprocessing for data compression

doi:10.1109/TC.2005.85

Journal Article10.1109/TC.2005.85

Universal text preprocessing for data compression

J. Abel, +1 more

- 01 May 2005

- IEEE Transactions on Computers

- Vol. 54, Iss: 5, pp 497-507

55

TL;DR: Several preprocessing algorithms for text files are presented which complement each other and which are performed prior to the compression scheme and the compression gain is compared along with the costs of speed for the BWT, PPM, and LZ compression schemes.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1016/J.JKSUCI.2018.05.006

A survey on data compression techniques: From the perspective of data quality, coding schemes, data type and applications

Uthayakumar Jayasankar, +2 more

- 01 Feb 2021

- Journal of King Saud University - Comput...

TL;DR: Insight is gained to various open issues and research directions to explore the promising areas for future developments in data compression techniques and its applications.

...read moreread less

247

Journal Article•10.1002/CPE.882

MEAD: support for Real‐Time Fault‐Tolerant CORBA

Priya Narasimhan, +6 more

- 01 Oct 2005

- Concurrency and Computation: Practice an...

TL;DR: The MEAD (Middleware for Embedded Adaptive Dependability) system attempts to identify and to reconcile the conflicts between real‐time and fault tolerance, in a resource‐aware manner, for distributed CORBA applications.

...read moreread less

76

Journal Article•10.1002/SPE.678

Revisiting dictionary‐based compression

Przemysław Skibiński, +2 more

- 01 Dec 2005

- Software - Practice and Experience

TL;DR: This paper discusses several aspects of dictionary‐based compression, including compact dictionary representation, and presents a PPM/BWCA‐oriented scheme, word replacing transformation, achieving compression ratios higher by 2–6% than the state‐of‐the‐art StarNT (2003) text preprocessor.

...read moreread less

60

•Dissertation

Adaptive models of Arabic text

Khaled M. Alhawiti

- 01 Jan 2014

TL;DR: Two new adaptive models, BS-P PM and CS-PPM, based on the Prediction by Partial Matching (PPM) compression scheme are introduced to improve the compression performance of standard PPM model by using preprocessing techniques.

...read moreread less

35

•Journal Article•10.1016/j.mex.2022.101894

Analyzing tourism reviews using an LDA topic-based sentiment analysis approach

Twil Ali, +2 more

- 01 Nov 2022

- MethodsX

TL;DR: In this article , a combination of topic modeling and sentiment analysis, as well as human validation techniques of topic labels, was employed to extract valuable insights about Marrakech city from TripAdvisor reviews.

...read moreread less

20

...

Expand

References

•Journal Article•10.1136/BJO.46.11.704

A and V.

Robert W. Stephenson

- 01 Nov 1962

- British Journal of Ophthalmology

46.7K

Journal Article•10.1109/TIT.1977.1055714

A universal algorithm for sequential data compression

Jacob Ziv, +1 more

- 01 May 1977

- IEEE Transactions on Information Theory

TL;DR: The compression ratio achieved by the proposed universal code uniformly approaches the lower bounds on the compression ratios attainable by block-to-variable codes and variable- to-block codes designed to match a completely specified source.

...read moreread less

6.3K

A Block-sorting Lossless Data Compression Algorithm

Michael Burrows, +1 more

- 01 Jan 1994

TL;DR: A block-sorting, lossless data compression algorithm, and the implementation of that algorithm and the performance of the implementation with widely available data compressors running on the same hardware are compared.

...read moreread less

3K

Journal Article•10.1109/TIT.1975.1055349

Universal codeword sets and representations of the integers

Peter Elias

- 01 Mar 1975

- IEEE Transactions on Information Theory

TL;DR: An application is the construction of a uniformly universal sequence of codes for countable memoryless sources, in which the n th code has a ratio of average codeword length to source rate bounded by a function of n for all sources with positive rate.

...read moreread less

1.4K

Journal Article•10.1109/TCOM.1984.1096090

Data Compression Using Adaptive Coding and Partial String Matching

John G. Cleary, +1 more

- 01 Apr 1984

- IEEE Transactions on Communications

TL;DR: This paper describes how the conflict can be resolved with partial string matching, and reports experimental results which show that mixed-case English text can be coded in as little as 2.2 bits/ character with no prior knowledge of the source.

...read moreread less

1.4K