Open AccessPosted Content
DeepCABAC: A Universal Compression Algorithm for Deep Neural Networks
Simon Wiedemann,Heiner Kirchoffer,Stefan Matlage,Paul Haase,Arturo Marban,Talmaj Marinc,David Neumann,Tung Nguyen,Ahmed Osman,Detlev Marpe,Heiko Schwarz,Thomas Wiegand,Wojciech Samek +12 more
TL;DR: A universal compression algorithm for DNNs that is based on applying Context-based Adaptive Binary Arithmetic Coder (CABAC) to the DNN parameters, which applies a novel quantization scheme that minimizes a rate-distortion function while simultaneously taking the impact of quantization to theDNN performance into account.
read more
Abstract: The field of video compression has developed some of the most sophisticated and efficient compression algorithms known in the literature, enabling very high compressibility for little loss of information. Whilst some of these techniques are domain specific, many of their underlying principles are universal in that they can be adapted and applied for compressing different types of data. In this work we present DeepCABAC, a compression algorithm for deep neural networks that is based on one of the state-of-the-art video coding techniques. Concretely, it applies a Context-based Adaptive Binary Arithmetic Coder (CABAC) to the network's parameters, which was originally designed for the H.264/AVC video coding standard and became the state-of-the-art for lossless compression. Moreover, DeepCABAC employs a novel quantization scheme that minimizes the rate-distortion function while simultaneously taking the impact of quantization onto the accuracy of the network into account. Experimental results show that DeepCABAC consistently attains higher compression rates than previously proposed coding techniques for neural network compression. For instance, it is able to compress the VGG16 ImageNet model by x63.6 with no loss of accuracy, thus being able to represent the entire network with merely 8.7MB. The source code for encoding and decoding can be found at this https URL.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

Fig. 4: Sketch displaying the fundamental difference between the source coding and the model compression problem. In the source coding problem, the parameters of a model are quantized by minimizing a rate-distortion function as in eq. (6), whereas in the model compression problem as in eq. (7). The main difference between the two paradigms lies in their measure of error, where the former is based on a distance measure between the unquantized and quantized parameters and the latter on the prediction performance of the quantized model. This results in different quantization schemes as solutions, which are displayed in the sketch. Different colors denote different parameter values. The different shapes correspond to different stages on the quantization procedure, with circles denoting the unquantized values, squares their respective integer representations and triangles their corresponding quantization points. ![TABLE I: Compression ratios achieved at no loss of accuracy when applying different coding methods. DC-v1 & DCv2 denote the two versions of DeepCABAC, whereas Lloyd denotes the weighted Lloyd algorithm and uniform the nearestneighbor quantization scheme. For the latter two, we report the best compression results attained after applying scalar Huffman, CSR-Huffman [38] and the bzip2 lossless coding algorithms on to the quantized networks. In parenthesis are the resulting top-1 accuracies, and the sparsity ratios achieved as measured |w 6=0||w| .](/figures/tablei-1-2zo5kplxewfz.png)
TABLE I: Compression ratios achieved at no loss of accuracy when applying different coding methods. DC-v1 & DCv2 denote the two versions of DeepCABAC, whereas Lloyd denotes the weighted Lloyd algorithm and uniform the nearestneighbor quantization scheme. For the latter two, we report the best compression results attained after applying scalar Huffman, CSR-Huffman [38] and the bzip2 lossless coding algorithms on to the quantized networks. In parenthesis are the resulting top-1 accuracies, and the sparsity ratios achieved as measured |w 6=0||w| . 
TABLE II: Average bit-sizes per parameter for the SmallVGG16 network after applying different quantizers. DC-v1 & DC-v2 denote the two versions of DeepCABAC, whereas Lloyd denotes the weighted Lloyd algorithm and uniform corresponds to the nearest-neighbor quantization. We chose the networks that resided within the ±0.1 percentage point range from the accuracy attained after applying a uniform quantizer. In the case of the Lloyd and uniform quantizers, the sizes of the quantized networks were measured with regards to the entropy of their empirical probability mass distribution. In contrast, we measured the explicit average bit-size per parameter in DC-v1 and DC-v2. 
TABLE IV: The number of clusters per layer employed for the experiment described in section V-A. 
TABLE V: The number of clusters for the whole network as well as the λ values used in experiment V-A 
Fig. 1: The general structure of codes. Firstly, the encoder maps an input sample w from a probability source P (w) to a binary representation b by a two-step process. It quantizes the input by mapping it to an integer i = Q(w). Then, the integer is mapped to its corresponding binary representation b = B(i) by applying a binarization process. The decoder functions analogously, it maps the binary representation back to its integer value by applying the inverse B−1(b) = i, and subsequently it assigns a reconstruction value (or quantization point) Q−1(i) = q to it. We stress that Q−1 does not have to be the inverse of Q.
Citations
Clustered Federated Learning: Model-Agnostic Distributed Multitask Optimization Under Privacy Constraints
TL;DR: Clustered FL (CFL) as discussed by the authors exploits geometric properties of the FL loss surface to group the client population into clusters with jointly trainable data distributions, which can be viewed as a postprocessing method that will always achieve greater or equal performance than conventional FL by allowing clients to arrive at more specialized models.
696
•Posted Content
Clustered Federated Learning: Model-Agnostic Distributed Multi-Task Optimization under Privacy Constraints
TL;DR: Closed FL (CFL), a novel federated multitask learning (FMTL) framework, which exploits geometric properties of the FL loss surface to group the client population into clusters with jointly trainable data distributions, and comes with strong mathematical guarantees on the clustering quality.
Graphene memristive synapses for high precision neuromorphic computing.
TL;DR: Graphene-based multi-level (>16) and non-volatile memristive synapses with arbitrarily programmable conductance states and weight assignment based on k-means clustering are introduced, which offers greater computing accuracy when compared with uniform weight quantization for vector matrix multiplication, an essential component for any artificial neural network.
Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology
Stefan Studer,Thanh Binh Bui,Christian Drescher,Alexander Hanuschkin,Ludwig Winkler,Steven Peters,Klaus-Robert Müller +6 more
- 22 Apr 2021
TL;DR: An industry- and application-neutral process model tailored for machine learning applications with a focus on technical tasks for quality assurance is proposed, expanding on CRISP-DM, a data mining process model that enjoys strong industry support, but fails to address machine learning specific tasks.
Towards CRISP-ML(Q): A Machine Learning Process Model with Quality Assurance Methodology
Stefan Studer,Thanh Binh Bui,Christian Drescher,Alexander Hanuschkin,Ludwig Winkler,Steven Peters,Klaus-Robert Müller +6 more
- 03 Mar 2021
TL;DR: This work proposes a process model for the development of machine learning applications and expands on CRISP-DM, a data mining process model that enjoys strong industry support but lacks to address machine learning specific tasks.
129
References
•Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
- 04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
102.6K
Deep learning
TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
67K
•Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
- 01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
51.9K
•Posted Content
Distilling the Knowledge in a Neural Network
TL;DR: This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.
21.2K
Deep learning in neural networks
TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.
18.7K