TL;DR: In this paper, a neural network data compression consisting of N levels of neural network using a weighted average of N pattern-level predictors is proposed, which replaces the PPM predictor, which matches the context of the last few characters to previous occurrences in the input.
Abstract: The present invention is a system and method for lossless compression of data. The invention consists of a neural network data compression comprised of N levels of neural network using a weighted average of N pattern-level predictors. This new concept uses context mixing algorithms combined with network learning algorithm models. The invention replaces the PPM predictor, which matches the context of the last few characters to previous occurrences in the input, with an N-layer neural network trained by back propagation to assign pattern probabilities when given the context as input. The N-layer network described below, learns and predicts in a single pass, and compresses a similar quantity of patterns according to their adaptive context models generated in real-time. The context flexibility of the present invention ensures that the described system and method is suited for compressing any type of data, including inputs of combinations of different data types.
TL;DR: GeCo3, a new genomic sequence compressor that uses neural networks for mixing multiple context and substitution-tolerant context models, is created and benchmarked as a reference-free DNA compressor in 5 datasets.
Abstract: Background The increasing production of genomic data has led to an intensified need for models that can cope efficiently with the lossless compression of DNA sequences. Important applications include long-term storage and compression-based data analysis. In the literature, only a few recent articles propose the use of neural networks for DNA sequence compression. However, they fall short when compared with specific DNA compression tools, such as GeCo2. This limitation is due to the absence of models specifically designed for DNA sequences. In this work, we combine the power of neural networks with specific DNA models. For this purpose, we created GeCo3, a new genomic sequence compressor that uses neural networks for mixing multiple context and substitution-tolerant context models. Findings We benchmark GeCo3 as a reference-free DNA compressor in 5 datasets, including a balanced and comprehensive dataset of DNA sequences, the Y-chromosome and human mitogenome, 2 compilations of archaeal and virus genomes, 4 whole genomes, and 2 collections of FASTQ data of a human virome and ancient DNA. GeCo3 achieves a solid improvement in compression over the previous version (GeCo2) of $2.4\%$, $7.1\%$, $6.1\%$, $5.8\%$, and $6.0\%$, respectively. To test its performance as a reference-based DNA compressor, we benchmark GeCo3 in 4 datasets constituted by the pairwise compression of the chromosomes of the genomes of several primates. GeCo3 improves the compression in $12.4\%$, $11.7\%$, $10.8\%$, and $10.1\%$ over the state of the art. The cost of this compression improvement is some additional computational time (1.7-3 times slower than GeCo2). The RAM use is constant, and the tool scales efficiently, independently of the sequence size. Overall, these values outperform the state of the art. Conclusions GeCo3 is a genomic sequence compressor with a neural network mixing approach that provides additional gains over top specific genomic compressors. The proposed mixing method is portable, requiring only the probabilities of the models as inputs, providing easy adaptation to other data compressors or compression-based data analysis tools. GeCo3 is released under GPLv3 and is available for free download at https://github.com/cobilab/geco3.
TL;DR: In this article, the authors describe compression and decompression of data consisting of a one dimensional time series of floating point numbers. But they do not discuss the use of adaptive filters.
Abstract: Embodiments described herein relate to compression and decompression of data consisting of a one dimensional time series of floating point numbers. A compressor may comprise a lossless stage and in some embodiments a lossy stage in addition to the lossless stage. The lossy stage quantizes the data by discarding some of the least significant bits as specified by the user. The lossless stage uses a context mixing algorithm with two bit-wise predictive models whose predictions are combined and fed to an arithmetic coder. One model is a direct context model using the most significant bits of prior numeric samples as context. The other model is the output of an adaptive filter, in which the approximate predicted numeric value is used as context to model the actual value. A corresponding decompressor uses the same lossless model with the arithmetic coder replaced by an arithmetic decoder.
TL;DR: This paper presents a novel technique for the estimation of the contextual symbol probabilities with moderate computational complexity that is derived for each single pixel based on a soft context formation and is fed into a full-adaptive arithmetic coder.
Abstract: The challenge in data compression is to transmit or store a minimum number of bits per sample that should be as close as possible to the number of bits defined by the actual average information content. The latter can be approximated by the contextual entropy. The dilemma is generally that the optimal estimation of contextual symbol probabilities is a kind of artificial-intelligence problem and requires enormous computational efforts. Therefore, practical compression methods typically utilise some pre-knowledge about the data. In image compression, this concerns either autocorrelation properties of natural images, such as photographs, or assumptions about repeating patterns in synthetic data that often can be observed in screen content. This paper presents a novel technique for the estimation of the contextual symbol probabilities with moderate computational complexity. The probability distribution is derived for each single pixel based on a soft context formation and is fed into a full-adaptive arithmetic coder. Applied to synthetic images and images with mixed content up to 8000 colors, the proposed scheme shows bit-savings of about 20% compared to the compression with the HEVC reference software (HM-16.7+SCM-6.0), and it also can compete with methods based on context mixing.
TL;DR: Inpainting-based compression as mentioned in this paper represents images in terms of a sparse subset of its pixel data, and Storing the carefully optimised positions of known data creates a lossless compression problem on sparse and often scattered binary images.
Abstract: Inpainting-based compression represents images in terms of a sparse subset of its pixel data. Storing the carefully optimised positions of known data creates a lossless compression problem on sparse and often scattered binary images. This central issue is crucial for the performance of such codecs. Since it has only received little attention in the literature, we conduct the first systematic investigation of this problem so far. To this end, we first review and compare a wide range of existing methods from image compression and general purpose coding in terms of their coding efficiency and runtime. Afterwards, an ablation study enables us to identify and isolate the most useful components of existing methods. With context mixing, we combine those ingredients into new codecs that offer either better compression ratios or a more favourable trade-off between speed and performance.