GRACE: A Compressed Communication Framework for Distributed Machine Learning
Hang Xu,Chen-Yu Ho,Ahmed M. Abdelmoniem,Aritra Dutta,El Houcine Bergou,Konstantinos Karatsenidis,Marco Canini,Panos Kalnis +7 more
- 07 Jul 2021
- pp 561-572
TL;DR: In this article, the authors present a comprehensive survey of the most influential compressed communication methods for DNN training, together with an intuitive classification (i.e., quantization, sparsification, hybrid and low-rank).
read more
Abstract: Powerful computer clusters are used nowadays to train complex deep neural networks (DNN) on large datasets. Distributed training increasingly becomes communication bound. For this reason, many lossy compression techniques have been proposed to reduce the volume of transferred data. Unfortunately, it is difficult to argue about the behavior of compression methods, because existing work relies on inconsistent evaluation testbeds and largely ignores the performance impact of practical system configurations. In this paper, we present a comprehensive survey of the most influential compressed communication methods for DNN training, together with an intuitive classification (i.e., quantization, sparsification, hybrid and low-rank). Next, we propose GRACE, a unified framework and API that allows for consistent and easy implementation of compressed communication on popular machine learning toolkits. We instantiate GRACE on TensorFlow and PyTorch, and implement 16 such methods. Finally, we present a thorough quantitative evaluation with a variety of DNNs (convolutional and recurrent), datasets and system configurations. We show that the DNN architecture affects the relative performance among methods. Interestingly, depending on the underlying communication library and computational cost of compression / decompression, we demonstrate that some methods may be impractical. GRACE and the entire benchmarking suite are available as open-source.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Figures

TABLE II: Summary of the benchmarks and quality metrics used in this work. 
Fig. 1: Top-1 accuracy for VGG16 on CIFAR-10 with TensorFlow on 8 workers via 25 Gbps network links. In (b) Randk converges in 450s, but 8-bit quantization needs 1200s. 
Fig. 8: Latency of compress and decompress for different compressors with a range of input sizes. Thr esh 
Fig. 10: Performance of compressors for ResNet-50 on ImageNet via 1 Gbps network. Legend in Figure 6. 
Fig. 9: Throughput for ResNet-9 on CIFAR10 contrasting TCP vs. RDMA performance in PyTorch. 
TABLE I: Classification of surveyed gradient compression methods. Note that ‖g̃‖0 and ‖g‖0 are the number of elements in the compressed and uncompressed gradient, respectively; nature of operator Q is random or deterministic; EF-On indicates if error feedback is used in our experiments. We implement 16 methods on TensorFlow and PyTorch.
Citations
AI-based Fog and Edge Computing: A Systematic Review, Taxonomy and Future Directions
Sundas Iftikhar,Sukhpal Singh Gill,Chenghao Song,Minxian Xu,Mohammad Sadegh Aslanpour,Adel Nadjaran Toosi,Junhui Du,Huaming Wu,Shreya Ghosh,Deepraj Chowdhury,Muhammed Golec,Mohit Kumar,Ahmed M. Abdelmoniem,Felix Cuadrado,Blesson Varghese,Omer Rana,Schahram Dustdar,Steve Uhlig +17 more
TL;DR: In this article , the role of AI/ML algorithms and the challenges in the applicability of these algorithms for resource management in fog/edge computing environments are analyzed using a systematic literature review (SLR).
Gradient Compression Supercharged High-Performance Data Parallel DNN Training
Youhui Bai,Cheng Li,Quan Zhou,Jun Yi,Ping Gong,Feng Yan,Ruichuan Chen,Yinlong Xu +7 more
- 26 Oct 2021
TL;DR: In this paper, a compression-aware gradient synchronization architecture, CaSync, is proposed to alleviate the communication bottleneck in data parallel deep neural network (DNN) training by significantly reducing the data volume of gradients for synchronization.
44
•Posted Content
Genuinely Distributed Byzantine Machine Learning
TL;DR: A new algorithm, ByzSGD, is presented, which solves the general Byzantine-resilient distributed machine learning problem by relying on three major schemes, including Scatter/Gather, Distributed Median Contraction, and Minimum-Diameter Averaging, whose goal is to tolerate Byzantine workers.
36
Empirical analysis of federated learning in heterogeneous environments
Ahmed M. Abdelmoniem,Chen-Yu Ho,Pantelis Papageorgiou,Marco Canini +3 more
- 05 Apr 2022
TL;DR: An extensive empirical study spanning close to 1.5K unique configurations on five popular FL benchmarks shows that these sources of heterogeneity have a major impact on both model performance and fairness, thus shedding light on the importance of considering heterogeneity in FL system design.
A Compressed Gradient Tracking Method for Decentralized Optimization With Linear Convergence
TL;DR: In this article , a compressed gradient tracking algorithm (C-GT) was proposed to solve the decentralized optimization problem under limited communication, where the global objective is to minimize the average of local cost functions over a multiagent network using only local computation and peer-to-peer communication.
References
Deep Residual Learning for Image Recognition
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
•Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
- 01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
138.5K
•Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
- 04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
102.6K
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
99K
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger,Philipp Fischer,Thomas Brox +2 more
- 05 Oct 2015
TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.