Proceedings Article10.1109/ALLERTON.2018.8635933
Cross-Iteration Coded Computing
Farzin Haddadpour,Yaoqing Yang,Viveck R. Cadambe,Pulkit Grover +3 more
- 01 Oct 2018
- pp 196-203
23
TL;DR: The idea of cross-iteration coded computing is introduced, an approach to reducing communication costs for a large class of distributed iterative algorithms involving linear operations, including gradient descent and accelerated gradient descent for quadratic loss functions.
read more
Abstract: We introduce the idea of cross-iteration coded computing, an approach to reducing communication costs for a large class of distributed iterative algorithms involving linear operations, including gradient descent and accelerated gradient descent for quadratic loss functions. The state-of-the-art approach for these iterative algorithms involves performing one iteration of the algorithm per round of communication among the nodes. In contrast, our approach performs multiple iterations of the underlying algorithm in a single round of communication by incorporating some redundancy storage and computation. Our algorithm works in the master-worker setting with the workers storing carefully constructed linear transformations of input matrices and using these matrices in an iterative algorithm, with the master node inverting the effect of these linear transformations. In addition to reduced communication costs, a trivial generalization of our algorithm also includes resilience to stragglers and failures. The degree of redundancy of our algorithm can be tuned based on the amount of communication and straggler resilience required. Finally, we also describe a variant of our algorithm that can flexibly recover the results based on the degree of straggling in the worker nodes. The variant allows for the performance to degrade gracefully as the number of successful (non-straggling) workers is lowered.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
“Short-Dot”: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products
TL;DR: The key novelty in this work is that in the particular regime where the number of available processing nodes is greater than the total number of dot products, Short-Dot has lower expected computation time under straggling under an exponential model compared to existing strategies.
On the Optimal Recovery Threshold of Coded Matrix Multiplication
Sanghamitra Dutta,Mohammad Fahim,Farzin Haddadpour,Haewon Jeong,Viveck R. Cadambe,Pulkit Grover +5 more
TL;DR: Novel coded computation strategies for distributed matrix–matrix products that outperform the recent “Polynomial code” constructions in recovery threshold, i.e., the required number of successful workers are provided.
302
•Posted Content
On the Convergence of Local Descent Methods in Federated Learning.
TL;DR: The obtained convergence rates are the sharpest known to date on the convergence of local decant methods with periodic averaging for solving nonconvex federated optimization in both centralized and networked distributed optimization.
•Posted Content
"Short-Dot": Computing Large Linear Transforms Distributedly Using Coded Short Dot Products
TL;DR: In this paper, the authors propose a technique called Short-Dot to reduce the number of redundant computations in a coding theory inspired fashion for computing linear transforms of long vectors.
202
•Posted Content
Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization
TL;DR: This paper shows that for loss functions that satisfy the Polyak-Kojasiewicz condition, rounds of communication suffice to achieve a linear speed up, that is, an error of $O(1/pT)$, where $T$ is the total number of model updates at each worker.
143
References
•Proceedings Article
Parallelized Stochastic Gradient Descent
Martin Zinkevich,Markus Weimer,Lihong Li,Alexander J. Smola +3 more
- 06 Dec 2010
TL;DR: This paper presents the first parallel stochastic gradient descent algorithm including a detailed analysis and experimental evidence and introduces a novel proof technique — contractive mappings to quantify the speed of convergence of parameter distributions to their asymptotic limits.
•Posted Content
Revisiting Distributed Synchronous SGD
TL;DR: It is demonstrated that a third approach, synchronous optimization with backup workers, can avoid asynchronous noise while mitigating for the worst stragglers and is empirically validated and shown to converge faster and to better test accuracies.
Speeding Up Distributed Machine Learning Using Codes
TL;DR: In this paper, the authors provide theoretical insights on how coded solutions can achieve significant gains compared to uncoded ones for matrix multiplication and data shuffling in large-scale distributed systems.
923
Speeding Up Distributed Machine Learning Using Codes
TL;DR: In this paper, the authors provide theoretical insights on how coded solutions can achieve significant gains compared with uncoded ones for matrix multiplication and data shuffling in large-scale distributed systems.
798
•Posted Content
Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization
Shai Shalev-Shwartz,Tong Zhang +1 more
TL;DR: A new analysis of Stochastic Dual Coordinate Ascent (SDCA) is presented showing that this class of methods enjoy strong theoretical guarantees that are comparable or better than SGD.