TL;DR: To improve the coding efficiency of neural video codec (NVC), this paper proposes increasing the context diversity in both temporal and spatial dimensions. The proposed method achieves significant bitrate saving over previous SOTA NVC and surpasses the under-developing next generation traditional codec/ECM.
Abstract: For any video codecs, the coding efficiency highly relies on whether the current signal to be encoded can find the relevant contexts from the previous reconstructed signals. Traditional codec has verified more contexts bring substantial coding gain, but in a time-consuming manner. However, for the emerging neural video codec (NVC), its contexts are still limited, leading to low compression ratio. To boost NVC, this paper proposes increasing the context diversity in both temporal and spatial dimensions. First, we guide the model to learn hierarchical quality patterns across frames, which enriches long-term and yet highquality temporal contexts. Furthermore, to tap the potential of optical flow-based coding framework, we introduce a group-based offset diversity where the cross-group interaction is proposed for better context mining. In addition, this paper also adopts a quadtree-based partition to increase spatial context diversity when encoding the latent representation in parallel. Experiments show that our codec obtains 23.5% bitrate saving over previous SOTA NVC. Better yet, our codec has surpassed the under-developing next generation traditional codec/ECM in both RGB and YUV420 colorspaces, in terms of PSNR. The codes are at https://github.com/microsoft/DCVC.
TL;DR: SZ3 as discussed by the authors is a modular, composable compression framework for error-bounded lossy compression, which can be plugged in easily to create new compressors based on characteristics of data and user requirements.
Abstract: Today's scientific simulations require a significant reduction of data volume because of extremely large amounts of data they produce and the limited I/O bandwidth and storage space. Error-bounded lossy compression has been considered one of the most effective solutions to the above problem. In practice, however, the best-fit compression method often needs to be customized or optimized in particular because of diverse characteristics in different datasets and various user requirements on the compression quality and performance. In this paper, we address this issue with a novel modular, composable compression framework named SZ3. Our contributions are four-folds. (1) We develop SZ3 which features an innovative modular abstraction for the prediction-based compression framework, such that compression modules can be plugged in easily to create new compressors based on characteristics of data and user requirements. (2) We create a new compression pipeline by SZ3 for GAMESS data, which significantly improves the compression ratios over state-of-the-art compressors. (3) We develop an adaptive compression pipeline by SZ3 for APS data with minimal efforts, which leads to the best rate-distortion among all existing error-bounded lossy compressors for any bit-rate. (4) We compare the sustainability of SZ3 with leading error-bounded prediction-based compressors, and then demonstrate the necessity of diverse pipelines by integrating and evaluating several compression pipelines on diverse scientific datasets from multiple disciplines. Experiments show that SZ3 incurs very limited overhead in compressor integration and our customized compression pipelines lead to up to 20% improvement in compression ratios under the same data distortion, when compared with the best existing approach.
TL;DR: In this article , the authors provide a state-of-the-art survey of the principal time series compression techniques, proposing a taxonomy to classify them considering their overall approach and their characteristics.
Abstract: The presence of smart objects is increasingly widespread and their ecosystem, also known as Internet of Things, is relevant in many different application scenarios. The huge amount of temporally annotated data produced by these smart devices demand for efficient techniques for transfer and storage of time series data. Compression techniques play an important role toward this goal and, despite the fact that standard compression methods could be used with some benefit, there exist several ones that specifically address the case of time series by exploiting their peculiarities to achieve a more effective compression and a more accurate decompression in the case of lossy compression techniques. This paper provides a state-of-the-art survey of the principal time series compression techniques, proposing a taxonomy to classify them considering their overall approach and their characteristics. Furthermore, we analyze the performances of the selected algorithms by discussing and comparing the experimental results that where provided in the original articles. The goal of this paper is to provide a comprehensive and homogeneous reconstruction of the state-of-the-art which is currently fragmented across many papers that use different notations and where the proposed methods are not organized according to a classification.
TL;DR: In this article , the authors provide a clear analysis and review of data compression mechanisms in IoT-enabled wearable WSNs, including communication compression, sampling compression, and data compression techniques.
Abstract: The rapid proliferation of Wireless Sensor Networks (WSN) and other linked devices has given rise to several notions that blend the virtual and real worlds. A vision in which billions of intelligent objects are joined together to provide connectivity for anything, not just everyone. The data quantities gathered and transferred will expand significantly as the number of participants in the future Internet of things grows, rendering the traditional data gathering and processing methods impaired. As a result, the volume of data should be decreased so that decision-makers can mine and evaluate such massive amounts of data. In the Internet of things (IoT), dedicated to healthcare, various data may be collected from diverse body sensors, ambient sensors, and other data sources such as cameras, voice recorders, and so on. The processing, synchronization, aggregation, and compression of these heterogeneous data are crucial tasks for providing accurate real-time healthcare services. Energy efficiency imposes a strict limitation on wearable WSNs since wireless transmission consumes a large amount of power. Several compression approaches have been presented in the literature to tackle the issue of energy consumption. These approaches can be divided into three categories: communication compression, sampling compression, and data compression. Data compression mechanisms should lessen the data length and compress data using fewer resources. The peculiarities of the data should be addressed during the compression process. Irrelevant data might be eliminated depending on the user's capacity to use or comprehend such data. Data compression techniques are extensions of compression algorithms and data aggregation methods. Data compression algorithms play a substantial role in WBSNs as the sensors in WBSNs have restricted memory and low battery power. Furthermore, data should be transmitted quickly and lossless to provide real-time services. Despite the availability of many review papers on data compression techniques in wireless sensor networks, there is a lack of surveys that identify gaps in existing data compression techniques, highlight areas for future research, and provide a comprehensive analysis of the current trends and practices in the IoT-enabled WBSN, primarily the healthcare domain. This paper will fill the gap, provide a clear analysis, and review data compression mechanisms in IoT-enabled WBSN. We outline the main requirements for IoT-enabled WBSN, existing methods, and state-of-the-art solutions. Furthermore, we evaluated the performance of the current techniques in the literature based on several criteria such as compression ratio, complexity, energy saved, minimized transmission, energy consumption, Net energy saved, energy efficiency, reliability, and scalability. More importantly, we discussed how data compression methods could be a crucial enabler in solving many IoT problems. The paper also identifies open research problems and challenges for IoT-enabled WBSN.
TL;DR: The goal of data compression is to reduce the number of bits needed to represent useful information as discussed by the authors , and neural compression is the application of neural networks and other machine learning methods to data compression.
Abstract: The goal of data compression is to reduce the number of bits needed to represent useful information. Neural, or learned compression, is the application of neural networks and related machine learning techniques to this task. This monograph aims to serve as an entry point for machine learning researchers interested in compression by reviewing the prerequisite background and representative methods in neural compression. Neural compression is the application of neural networks and other machine learning methods to data compression. Recent advances in statistical machine learning have opened up new possibilities for data compression, allowing compression algorithms to be learned end-to-end from data using powerful generative models such as normalizing flows, variational autoencoders, diffusion probabilistic models, and generative adversarial networks. This monograph introduces this field of research to a broader machine learning audience by reviewing the necessary background in information theory (e.g., entropy coding, rate-distortion theory) and computer vision (e.g., image quality assessment, perceptual metrics), and providing a curated guide through the essential ideas and methods in the literature thus far. Instead of surveying the vast literature, essential concepts and methods in neural compression are covered, with a reader in mind who is versed in machine learning but not necessarily data compression.
TL;DR: In this paper , a review of CS for physiological signals is presented, including electroencephalography (EEG), electrocardiography (ECG), electromyography (EMG), and electrodermal activity (EDA).
Abstract: The immense progress in physiological signal acquisition and processing in health monitoring allowed a better understanding of patient disease detection and diagnosis. With the increase in data volume and power consumption, effective data compression, signal acquisition, transmission, and processing techniques are essential, especially in telemonitoring healthcare applications. An emerging research area focuses on integrating compressed sensing (CS) with physiological signals to deal with a massive amount of physiological data, transmission bandwidth, and power-saving purposes. A review of CS for physiological signals is presented in this article, including electroencephalography (EEG), electrocardiography (ECG), electromyography (EMG), and electrodermal activity (EDA), focusing on the pros and cons of CS in treating such signals and the suitability of CS for hardware implementation. Furthermore, we emphasize performance matrices, such as compression ratio (CR), signal-to-noise ratio (SNR), Percentage Root-mean-square Difference (PRD), and processing time to evaluate the performance of CS. We also investigate the current practices, challenges, and opportunities of using CS in healthcare applications.
TL;DR: Adaptable semantic compression and resource allocation for task-oriented communications optimize compression ratio and resource allocation to maximize task success probability.
Abstract: Task-oriented communication is a new paradigm that aims at providing efficient connectivity for accomplishing intelligent tasks rather than reception of every transmitted bit. This paper proposes task-oriented communication architecture for end-to-end semantics transmission, where extracted semantics is compressed by the proposed adaptable semantic compression (ASC) method. However, accommodating multiple users in a delay-intolerant system poses a challenge. Higher compression ratios conserve channel resources but cause semantic distortion, while lower ratios demand more resources and may lead to transmission failure due to delay constraints. To address this, we optimize both compression ratio and resource allocation to maximize task success probability. Specifically, we propose a compression ratio and resource allocation (CRRA) algorithm that separates the problem into two subproblems and solving them iteratively. Furthermore, for scenarios with varying service levels among users, a compression ratio, resource allocation, and user selection (CRRAUS) algorithm is proposed, adaptively selecting users through branch and bound method. Simulation results show that ASC approach can reduce the size of transmitted data by up to 80% without compromising task success probability. Furthermore, numerical results clearly demonstrate that both the proposed CRRA and CRRAUS algorithms lead to substantial improvements in terms of success gains when compared to the baselines.
TL;DR: In this article , a quantization-aware posterior and prior is proposed to enable quantization and entropy coding for image compression, and the model compresses images in a coarse-to-fine fashion and supports parallel encoding and decoding.
Abstract: Recent work has shown a strong theoretical connection between variational autoencoders (VAEs) and the rate distortion theory. Motivated by this, we consider the problem of lossy image compression from the perspective of generative modeling. Starting from ResNet VAEs, which are originally designed for data (image) distribution modeling, we redesign their latent variable model using a quantization-aware posterior and prior, enabling easy quantization and entropy coding for image compression. Along with improved neural network blocks, we present a powerful and efficient class of lossy image coders, outperforming previous methods on natural image (lossy) compression. Our model compresses images in a coarse-to-fine fashion and supports parallel encoding and decoding, leading to fast execution on GPUs. Code is made available online.
TL;DR: In this paper , the authors present optimizations on VCA for faster and energy-efficient video complexity analysis, using eight CPU threads, Single Instruction Multiple Data (SIMD), and low-pass DCT optimization.
Abstract: For adaptive streaming applications, low-complexity and accurate video complexity features are necessary to analyze the video content in real time, which ensures fast and compression-efficient video streaming without disruptions. State-of-the-art video complexity features are Spatial Information (SI) and Temporal Information (TI) features which do not correlate well with the encoding parameters in adaptive streaming applications. To this light, Video Complexity Analyzer (VCA) was introduced, determining the features based on Discrete Cosine Transform (DCT)-energy. This paper presents optimizations on VCA for faster and energy-efficient video complexity analysis. Experimental results show that VCA v2.0, using eight CPU threads, Single Instruction Multiple Data (SIMD), and low-pass DCT optimization, determines seven complexity features of Ultra High Definition 8-bit videos with better accuracy at a speed of up to 292.68 fps and an energy consumption of 97.06% lower than the reference SITI implementation.
TL;DR: In this article , a rate-distortion optimized quantization (RDOQ) method was proposed to improve the coding efficiency of predicting transform (PT) by setting the quantized residuals to zero forcedly.
Abstract: Limited by the network bandwidth, three-dimensional (3D) point cloud needs to be efficiently compressed before transmission. As one of the three attribute coding methods adopted in the geometry-based point cloud compression (G-PCC) standard developed by MPEG, predicting transform (PT) has received increasing attention. To further improve the coding efficiency of PT, we propose a rate-distortion optimized quantization (RDOQ) in which an additional option for quantization results is added by setting the quantized residuals to zero forcedly. Rate-distortion optimization is then used to determine whether the final quantized residual should be zero or nonzero. Experimental results show that average BD rates of −0.5 % , −3.0 % , and −3.0 % can be achieved for Luma, Chroma Cb, and Chroma Cr components, respectively, with negligible increment of time complexity.
TL;DR: Experimental results demonstrate that compared to the state-of-the-art video coding standard Versatile Video Coding (VVC) as well as the latest generative compression schemes, the proposed scheme is superior in terms of both objective and subjective quality at the same bitrate.
Abstract: In this paper, we propose to compactly represent the nonlinear dynamics along the temporal trajectories for talking face video compression. By projecting the frames into a high dimensional space, the temporal trajectories of talking face frames, which are complex, non-linear and difficult to extrapolate, are implicitly modelled in an end-to-end inference framework based upon very compact feature representation. As such, the proposed framework is suitable for ultra-low bandwidth video communication and can guarantee the quality of the reconstructed video in such applications. The proposed compression scheme is also robust against large head-pose motions, due to the delicately designed dynamic reference refresh and temporal stabilization mechanisms. Experimental results demonstrate that compared to the state-of-the-art video coding standard Versatile Video Coding (VVC) as well as the latest generative compression schemes, our proposed scheme is superior in terms of both objective and subjective quality at the same bitrate. The project page can be found at https://github.com/Berlin0610/CTTR.
TL;DR: In this article , a long-range convolution compression network (LRCompNet) was proposed for remote sensing images, and an improved non-local attention model was proposed to reduce the computation complexity in order to accommodate remote sensing image compression.
TL;DR: In this article , an open-source, streamable, and real-time neural audio codec that achieves strong performance along all three axes: it can reconstruct highly natural sounding 48 kHz speech signals while operating at only 12 kbps and running with less than 6 ms (GPU)/10 ms (CPU) latency.
Abstract: A good audio codec for live applications such as telecommunication is characterized by three key properties: (1) compression, i.e. the bitrate that is required to transmit the signal should be as low as possible; (2) latency, i.e. encoding and decoding the signal needs to be fast enough to enable communication without or with only minimal noticeable delay; and (3) reconstruction quality of the signal. In this work, we propose an open-source, streamable, and real-time neural audio codec that achieves strong performance along all three axes: it can reconstruct highly natural sounding 48 kHz speech signals while operating at only 12 kbps and running with less than 6 ms (GPU)/10 ms (CPU) latency. An efficient training paradigm is also demonstrated for developing such neural audio codecs for real-world scenarios. Both objective and subjective evaluations using the VCTK corpus are provided. To sum up, AudioDec is a well-developed plug-and-play benchmark for audio codec applications.
TL;DR: In this article , the authors propose Espresso, a decision tree abstraction to express any compression strategies and develops empirical models to derive the intricate interactions among tensors to derive a compression decision algorithm that analyzes tensor interactions to eliminate and prioritize strategies and optimally offloads compression from GPUs to CPUs.
Abstract: Gradient compression (GC) is a promising approach to addressing the communication bottleneck in distributed deep learning (DDL). It saves the communication time, but also incurs additional computation overheads. The training throughput of compression-enabled DDL is determined by the compression strategy, including whether to compress each tensor, the type of compute resources (e.g., CPUs or GPUs) for compression, the communication schemes for compressed tensor, and so on. However, it is challenging to find the optimal compression strategy for applying GC to DDL because of the intricate interactions among tensors. To fully unleash the benefits of GC, two questions must be addressed: 1) How to express any compression strategies and the corresponding interactions among tensors of any DDL training job? 2) How to quickly select a near-optimal compression strategy? In this paper, we propose Espresso to answer these questions. It first designs a decision tree abstraction to express any compression strategies and develops empirical models to timeline tensor computation, communication, and compression to enable Espresso to derive the intricate interactions among tensors. It then designs a compression decision algorithm that analyzes tensor interactions to eliminate and prioritize strategies and optimally offloads compression from GPUs to CPUs. Experimental evaluations show that Espresso can improve the training throughput over the start-of-the-art compression-enabled system by up to 77% for representative DDL training jobs. Moreover, the computational time needed to select the compression strategy is measured in milliseconds, and the selected strategy is only a few percent from optimal.
TL;DR: Ren et al. as discussed by the authors proposed an advanced learned video compression (ALVC) approach with the in-loop frame prediction module, which is able to effectively predict the target frame from the previously compressed frames, without consuming any bit-rate.
Abstract: Recent years have witnessed an increasing interest in end-to-end learned video compression. Most previous works explore temporal redundancy by detecting and compressing a motion map to warp the reference frame towards the target frame. Yet, it failed to adequately take advantage of the historical priors in the sequential reference frames. In this paper, we propose an Advanced Learned Video Compression (ALVC) approach with the in-loop frame prediction module, which is able to effectively predict the target frame from the previously compressed frames, without consuming any bit-rate. The predicted frame can serve as a better reference than the previously compressed frame, and therefore it benefits the compression performance. The proposed in-loop prediction module is a part of the end-to-end video compression and is jointly optimized in the whole framework. We propose the recurrent and the bi-directional in-loop prediction modules for compressing P-frames and B-frames, respectively. The experiments show the state-of-the-art performance of our ALVC approach in learned video compression. We also outperform the default hierarchical B mode of x265 in terms of PSNR and beat the slowest mode of the SSIM-tuned x265 on MS-SSIM. The project page: https://github.com/RenYang-home/ALVC.
TL;DR: In this article , a semantic-aware (SA) video compression (SAC) frame-work was proposed, which compresses separately and simultaneously region-of-interest and region-out-ofinterest of automotive camera video frames, before transmitting them to processing unit(s), where the data are used for perception tasks, such as object detection, semantic segmentation, etc.
Abstract: —Assisted and automated driving functions in vehicles exploit sensor data to build situational awareness, however, the data amount required by these functions might exceed the bandwidth of current wired vehicle communication networks. Consequently, sensor data reduction, and automotive camera video compression need investigation. However, conventional video compression schemes, such as H.264 and H.265, have been mainly optimised for human vision. In this paper, we propose a semantic-aware (SA) video compression (SAC) frame- work that compresses separately and simultaneously region-of-interest and region-out-of-interest of automotive camera video frames, before transmitting them to processing unit(s), where the data are used for perception tasks, such as object detection, semantic segmentation, etc. Using our newly proposed technique, the region-of-interest (ROI), encapsulating most of the road stakeholders, retains higher quality using lower compression ratio. The experimental results show that under the same overall compression ratio, our proposed SAC scheme maintains a similar or better image quality, measured accordingly to traditional metrics and to our newly proposed semantic-aware metrics. The newly proposed metrics, namely SA-PSNR, SA-SSIM, and iIoU, give more emphasis to ROI quality, which has an immediate impact on the planning and decisions of assisted and automated driving functions. Using our SA-X264 compression, SA-PSNR and SA-SSIM have an increase of 2.864 and 0.008 respectively compared to traditional H.264, with higher ROI quality and the same compression ratio. Finally, a segmentation-based perception algorithm has been used to compare reconstructed frames, demonstrating a 2.7% mIOU improvement, when using the proposed SAC method versus traditional compression techniques.
TL;DR: In this article , the authors proposed a secure compression method incorporating a secret key to solve the problem of simultaneously compressing and encrypting data without affecting the efficacy of either process. But, their technique is susceptible to the secret key and plaintext, as measured by the unicity distance.
Henry Gilbert, Michael Sandborn, Douglas C. Schmidt, Jesse Spencer-Smith, Jules White
21 Nov 2023
TL;DR: Large language models can compress text effectively, but their limited token capacity restricts their effectiveness on tasks requiring processing large sets or continuous streams of information. Approximate compression techniques using LLMs show promise in addressing this limitation.
Abstract: The rise of large language models (LLMs) is revolutionizing information retrieval, question answering, summarization, and code generation tasks. However, in addition to confidently presenting factually inaccurate information at times (known as “hallucinations”), LLMs are also inherently limited by the number of input and output tokens that can be processed at once, making them potentially less effective on tasks that require processing a large set or continuous stream of information. A common approach to reducing the size of data is through lossless or lossy compression. Yet, in some cases it may not be strictly necessary to perfectly recover every detail from the original data, as long as a requisite level of semantic precision or intent is conveyed. This paper presents three contributions to research on LLMs. First, we present the results from experiments exploring the viability of “approximate compression” using LLMs, focusing specifically on GPT-3.5 and GPT-4 via ChatGPT interfaces. Second, we investigate and quantify the capability of LLMs to compress text. Third, we present two novel metrics-Exact Reconstructive Effectiveness (ERE) and Semantic Reconstruction Effectiveness (SRE)-that quantify the level of preserved intent between text compressed and decompressed by the LLMs we studied. Our initial results indicate that GPT-4 can effectively compress and reconstruct text while preserving the semantic essence of the original text, providing a path to leverage more tokens than current limits allow.
Joo Chan Lee, Daniel Rho, Jong Hwan Ko, Eunbyung Park
26 Oct 2023
TL;DR: FFNeRV is a novel method for video compression and frame interpolation using frame-wise neural representations that incorporates flow information and a fully convolutional architecture. It outperforms standard video codecs and achieves performance comparable to state-of-the-art algorithms.
Abstract: Neural fields, also known as coordinate-based or implicit neural representations, have shown a remarkable capability of representing, generating, and manipulating various forms of signals. For video representations, however, mapping pixel-wise coordinates to RGB colors has shown relatively low compression performance and slow convergence and inference speed. Frame-wise video representation, which maps a temporal coordinate to its entire frame, has recently emerged as an alternative method to represent videos, improving compression rates and encoding speed. While promising, it has still failed to reach the performance of state-of-the-art video compression algorithms. In this work, we propose FFNeRV, a novel method for incorporating flow information into frame-wise representations to exploit the temporal redundancy across the frames in videos inspired by the standard video codecs. Furthermore, we introduce a fully convolutional architecture, enabled by one-dimensional temporal grids, improving the continuity of spatial features. Experimental results show that FFNeRV yields the best performance for video compression and frame interpolation among the methods using frame-wise representations or neural fields. To reduce the model size even further, we devise a more compact convolutional architecture using the group and pointwise convolutions. With model compression techniques, including quantization-aware training and entropy coding, FFNeRV outperforms widely-used standard video codecs (H.264 and HEVC) and performs on par with state-of-the-art video compression algorithms.
TL;DR: Li et al. as discussed by the authors designed a 3D trained wavelet-like transform to enable signal-dependent and non-separable transform, and an affine wavelet basis is introduced to capture the various local correlations in different regions of volumetric images.
Abstract: Volumetric image compression has become an urgent task to effectively transmit and store images produced in biological research and clinical practice. At present, the most commonly used volumetric image compression methods are based on wavelet transform, such as JP3D. However, JP3D employs an ideal, separable, global, and fixed wavelet basis to convert input images from pixel domain to frequency domain, which seriously limits its performance. In this paper, we first design a 3-D trained wavelet-like transform to enable signal-dependent and non-separable transform. Then, an affine wavelet basis is introduced to capture the various local correlations in different regions of volumetric images. Furthermore, we embed the proposed wavelet-like transform to an end-to-end compression framework called aiWave to enable an adaptive compression scheme for various datasets. Last but not least, we introduce the weight sharing strategies of the affine wavelet-like transform according to the volumetric data characteristics in the axial direction to reduce the number of parameters. The experimental results show that: 1) when cooperating our trained 3-D affine wavelet-like transform with a simple factorized entropy coding module, aiWave performs better than JP3D and is comparable in terms of encoding and decoding complexities; 2) when adding a context module to remove signal redundancy further, aiWave can achieve a much better performance than HEVC.
Shishira R Maiya, Sharath Girish, Max Ehrlich, Hanyu Wang, Kwot Sin Lee, Patrick Poirson, Pengxiang Wu, Chen Wang, Abhinav Shrivastava
1 Jun 2023
TL;DR: NIRVANA is a novel video INR method that exploits temporal redundancy and autoregressive patch-wise modeling to achieve high-quality video compression with improved encoding speed and scalability.
Abstract: Implicit Neural Representations (INR) have recently shown to be powerful tool for high-quality video compression. However, existing works are are limiting as they do not exploit the temporal redundancy in videos, leading to a long encoding time. Additionally, these methods have fixed architectures which do not scale to longer videos or higher resolutions. To address these issues, we propose NIRVANA, which treats videos as groups of frames and fits separate networks to each group performing patch-wise prediction. The video representation is modeled autoregressively, with networks fit on a current group initialized using weights from the previous group's model. To enhance efficiency, we quantize the parameters during training, requiring no post-hoc pruning or quantization. When compared with previous works on the benchmark UVG dataset, NIRVANA improves encoding quality from 37.36 to 37.70 (in terms of PSNR) and the encoding speed by 12x, while maintaining the same compression rate. In contrast to prior video INR works which struggle with larger resolution and longer videos, we show that our algorithm scales naturally due to its patch-wise and autoregressive design. Moreover, our method achieves variable bitrate compression by adapting to videos with varying inter-frame motion. NIRVANA also achieves 6x decoding speed scaling well with more GPUs, making it practical for various deployment scenarios. 11 The project site can be found here.
TL;DR: Wang et al. as mentioned in this paper proposed an outlier-processable attention-based asymmetric compression algorithm to tackle the outlier signals in heterogeneous edge-cloud learning framework, where paralleled methods compress normal data and outlier data distinctively based on their different structure information.
Abstract: Spectrum data compression with a high-rate compression and accurate reconstruction is of crucial importance for reducing the ultra-large data transmission from the edge sensors to the cloud for establishing high-quality spectrum maps. However, the current methods ignore the imbalanced edge-cloud computation resources and cannot tackle the outlier signals, resulting in significant challenges for achieving effective compression. Therefore, we develop an efficient heterogeneous edge-cloud learning framework. In the framework, paralleled methods compress normal data and outlier data distinctively based on their different structure information. Meanwhile, those methods are asymmetric for achieving low-cost compression at the edge and accurate reconstruction on the cloud. Based on the framework, we propose an outlier-processable attention-based asymmetric compression algorithm. A novel attention-based asymmetric convolutional neural network performs the normal data compression while a non-linear outlier compression algorithm realizes the outlier data compression. Compared with the state-of-the-art schemes in real-world settings, our proposed framework’s convergence speed increases by 120% . Meanwhile, our framework’s reconstruction accuracy increases by 68.42% under the interfered environments while maintaining superior compression speed and comprehensive performance. We also confirm our framework’s generalization ability to transfer among different tasks by deploying it under various spectrum environments.
TL;DR: In this article , a hybrid approach with advanced steganography, wavelet transform (WT), and lossless compression was developed to ensure privacy and storage of patient data through enhanced security and optimized storage of large data images that allow a pharmacologist to store twice as much information in the same storage space.
Abstract: Due to rapidly developing technology and new research innovations, privacy and data preservation are paramount, especially in the healthcare industry. At the same time, the storage of large volumes of data in medical records should be minimized. Recently, several types of research on lossless medically significant data compression and various steganography methods have been conducted. This research develops a hybrid approach with advanced steganography, wavelet transform (WT), and lossless compression to ensure privacy and storage. This research focuses on preserving patient data through enhanced security and optimized storage of large data images that allow a pharmacologist to store twice as much information in the same storage space in an extensive data repository. Safe storage, fast image service, and minimum computing power are the main objectives of this research. This work uses a fast and smooth knight tour (KT) algorithm to embed patient data into medical images and a discrete WT (DWT) to protect shield images. In addition, lossless packet compression is used to minimize memory footprints and maximize memory efficiency. JPEG formats’ compression ratio percentages are slightly higher than those of PNG formats. When image size increases, that is, for high-resolution images, the compression ratio lies between 7% and 7.5%, and the compression percentage lies between 30% and 37%. The proposed model increases the expected compression ratio and percentage compared to other models. The average compression ratio lies between 7.8% and 8.6%, and the expected compression ratio lies between 35% and 60%. Compared to state-of-the-art methods, this research results in greater data security without compromising image quality. Reducing images makes them easier to process and allows many images to be saved in archives.
TL;DR: FAZ as mentioned in this paper is a flexible and adaptive error-bounded lossy compression framework, which projects a fairly high capability of adapting to diverse datasets and can always keep the compression quality at the best level compared with other state-of-the-art compressors for different datasets.
Abstract: Error-bounded lossy compression has been effective to resolve the big scientific data issue because it has a great potential to significantly reduce the data volume while allowing users to control data distortion based on specified error bounds. However, none of the existing error-bounded lossy compressors can always obtain the best compression quality because of the diverse characteristics of different datasets. In this paper, we develop FAZ, a flexible and adaptive error-bounded lossy compression framework, which projects a fairly high capability of adapting to diverse datasets. FAZ can always keep the compression quality at the best level compared with other state-of-the-art compressors for different datasets. We perform a comprehensive evaluation using 6 real-world scientific applications and 6 other state-of-the-art error-bounded lossy compressors. Experiments show that compared with the other existing lossy compressors, FAZ can improve the compression ratio by up to 120%, 190%, and 75% when setting the same error bound, the same PSNR and the same SSIM, respectively.
TL;DR: In this article , a dynamic adaptive light field video transmission scheme that can achieve high compression and yet provide near-distortion-free LF video when the network condition is stable is proposed.
Abstract: In recent years, Light Field (LF) video has grabbed much attention as an emerging form of immersive media. LF collects, through a lens matrix, light information emanating in every direction, and obtains rich information about the scene, providing users with an immersive 6 Degrees of Freedom (DoF) experience. The visual content between different viewpoints is highly homogenized, suggesting the possibility of good compression and encoding. However, most fixed-structure LF coding schemes are difficult to adapt to the real-time requirements of different LF applications and best-effort network conditions causing packet loss. In this paper, we propose a dynamic adaptive LF video transmission scheme that can achieve high compression and yet provide near-distortion-free LF video when the network condition is stable. Additionally, for unstable network conditions a description scheduling algorithm is proposed, which can decode the LF video with the highest possible quality even if partial data cannot be received completely and/or timely. We achieve this by designing a Multiple Description Coding (MDC) based solution to transport the LF video compressed by a Graph Neural Network (GNN) model. Experimental results show that the scheduling algorithm can improve the quality of the decoding results by 3% to 15%. Compared with other similar schemes, our system greatly improves the reliability of the video streaming system against packet loss/error and supports heterogeneous receivers.
TL;DR: A stacked convolutional RBM auto-encoder (stacked CAE) model for compressing sensor data, which is made up of layers: an encode layer and a decode layer, both of which are discussed.
TL;DR: In this paper , the compression dimension of the bit cost of a single sample point is introduced into the fault-mechanism-based method in order to improve the compression ratio further.
Abstract: In condition monitoring for rolling bearings, it has achieved good diagnostic performance and clear mechanistic interpretation based on vibration data. The high sampling frequency of data collection preserves fault characteristics but brings the problem of big data. An effective way to reduce this problem is to apply data compression. However, in order not to affect the diagnostic performance of data, it is difficult to improve the compression ratio further. Inspired by the binarization method, the compression dimension of the bit cost of a single sample point is first introduced into the fault-mechanism-based method in this article. On this basis, a three-dimensional data compression method is proposed, and it is subsequently validated with two real-bearing datasets. Two performance metrics, including a newly defined one, are utilized to compare the proposed method with the five existing methods. The comparison results show that the proposed method significantly improves the compression ratio of data but maintains good diagnostic performance.
TL;DR: This article proposed an erasing-based lossless floating-point compression algorithm, i.e.,¯¯¯¯ Elf, which can directly determine the erased bits and restore the original values without losing any precision.
Abstract:
There are a prohibitively large number of floating-point time series data generated at an unprecedentedly high rate. An efficient, compact and lossless compression for time series data is of great importance for a wide range of scenarios. Most existing lossless floating-point compression methods are based on the XOR operation, but they do not fully exploit the trailing zeros, which usually results in an unsatisfactory compression ratio. This paper proposes an Erasing-based Lossless Floating-point compression algorithm, i.e.,
Elf.
The main idea of
Elf
is to erase the last few bits (i.e., set them to zero) of floating-point values, so the XORed values are supposed to contain many trailing zeros. The challenges of the erasing-based method are three-fold. First, how to quickly determine the erased bits? Second, how to losslessly recover the original data from the erased ones? Third, how to compactly encode the erased data? Through rigorous mathematical analysis,
Elf
can directly determine the erased bits and restore the original values without losing any precision. To further improve the compression ratio, we propose a novel encoding strategy for the XORed values with many trailing zeros.
Elf
works in a streaming fashion. It takes only
O
(
N
) (where
N
is the length of a time series) in time and
O
(1) in space, and achieves a notable compression ratio with a theoretical guarantee. Extensive experiments using 22 datasets show the powerful performance of
Elf
compared with 9 advanced competitors.
TL;DR: In this article , a Gaussian mixture model (GMM) is used to estimate the latent representation distribution and the entropy model parameter is estimated by combining the local context, global context, and hyperprior information.
Abstract: The synthetic aperture radar (SAR) image is widely used in many remote sensing applications. In order to store and transmit the increasing SAR image data, more efficient compression algorithms are needed. The purpose of this letter is to introduce a new framework for compressing SAR images. First, we propose a novel analysis and synthesis transform based on multi-Resblocks for transforming the original SAR image into a compact latent representation. Then, a Gaussian mixture model (GMM) is used to estimate the latent representation’s distribution. In order to explore the redundancy within the latent representation, the entropy model parameter is estimated by combining the local context, global context, and hyperprior information. In order to evaluate the performance of the proposed algorithm, we conduct experiments on a dataset of SAR images. The results show that the proposed algorithm outperforms JPEG2000 and some state-of-the-art learned image compression schemes in terms of compression performance.
TL;DR: Wang et al. as mentioned in this paper proposed an end-to-end coding scheme for varifocal multiview images, which provides a new paradigm for VFMV compression from data acquisition (source) end to vision application end.
Abstract: The emerging data, varifocal multiview (VFMV) has an exciting prospect in immersive multimedia. However, the distinctive data redundancy of VFMV derived from dense arrangements and blurriness differences among views causes difficulty in data compression. In this paper, we propose an end-to-end coding scheme for VFMV images, which provides a new paradigm for VFMV compression from data acquisition (source) end to vision application end. VFMV acquisition is first conducted in three ways at the source end, including conventional imaging, plenoptic refocusing, and 3D creation. The acquired VFMV has irregular focusing distributions due to varying focal planes, which decreases the similarity among adjacent views. To improve the similarity and the consequent coding efficiency, we rearrange the irregular focusing distributions in descending order and accordingly reorder the horizontal views. Then, the reordered VFMV images are scanned and concatenated as video sequences. We propose 4-directional prediction (4DP) to compress the reordered VFMV video sequences. Four most similar adjacent views from the left, upper left, upper and upper right directions serve as reference frames to improve the prediction efficiency. Finally, the compressed VFMV is transmitted and decoded at the application end, benefiting potential vision applications. Extensive experiments demonstrate that the proposed coding scheme is superior to the comparison scheme in objective quality, subjective quality and computational complexity. Experiments on new view synthesis show that VFMV can achieve extended depth of field than conventional multiview at the application end. Validation experiments show the effectiveness of view reordering, the advantage over typical MV-HEVC, and the flexibility on other data types, respectively.