Cross-Iteration Coded Computing

doi:10.1109/ALLERTON.2018.8635933

Proceedings Article10.1109/ALLERTON.2018.8635933

Cross-Iteration Coded Computing

Farzin Haddadpour, +3 more

- 01 Oct 2018

- pp 196-203

23

TL;DR: The idea of cross-iteration coded computing is introduced, an approach to reducing communication costs for a large class of distributed iterative algorithms involving linear operations, including gradient descent and accelerated gradient descent for quadratic loss functions.

Abstract: We introduce the idea of cross-iteration coded computing, an approach to reducing communication costs for a large class of distributed iterative algorithms involving linear operations, including gradient descent and accelerated gradient descent for quadratic loss functions. The state-of-the-art approach for these iterative algorithms involves performing one iteration of the algorithm per round of communication among the nodes. In contrast, our approach performs multiple iterations of the underlying algorithm in a single round of communication by incorporating some redundancy storage and computation. Our algorithm works in the master-worker setting with the workers storing carefully constructed linear transformations of input matrices and using these matrices in an iterative algorithm, with the master node inverting the effect of these linear transformations. In addition to reduced communication costs, a trivial generalization of our algorithm also includes resilience to stragglers and failures. The degree of redundancy of our algorithm can be tuned based on the amount of communication and straggler resilience required. Finally, we also describe a variant of our algorithm that can flexibly recover the results based on the degree of straggling in the worker nodes. The variant allows for the performance to degrade gracefully as the number of successful (non-straggling) workers is lowered.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1109/TIT.2019.2927558

“Short-Dot”: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products

Sanghamitra Dutta, +2 more

- 09 Jul 2019

- IEEE Transactions on Information Theory

TL;DR: The key novelty in this work is that in the particular regime where the number of available processing nodes is greater than the total number of dot products, Short-Dot has lower expected computation time under straggling under an exponential model compared to existing strategies.

...read moreread less

335

•Journal Article•10.1109/TIT.2019.2929328

On the Optimal Recovery Threshold of Coded Matrix Multiplication

Sanghamitra Dutta, +5 more

- 01 Jan 2020

- IEEE Transactions on Information Theory

TL;DR: Novel coded computation strategies for distributed matrix–matrix products that outperform the recent “Polynomial code” constructions in recovery threshold, i.e., the required number of successful workers are provided.

...read moreread less

302

•Posted Content

On the Convergence of Local Descent Methods in Federated Learning.

Farzin Haddadpour, +1 more

- 31 Oct 2019

- arXiv: Learning

TL;DR: The obtained convergence rates are the sharpest known to date on the convergence of local decant methods with periodic averaging for solving nonconvex federated optimization in both centralized and networked distributed optimization.

...read moreread less

287

•Posted Content

"Short-Dot": Computing Large Linear Transforms Distributedly Using Coded Short Dot Products

Sanghamitra Dutta, +2 more

- 18 Apr 2017

- arXiv: Information Theory

TL;DR: In this paper, the authors propose a technique called Short-Dot to reduce the number of redundant computations in a coding theory inspired fashion for computing linear transforms of long vectors.

...read moreread less

202

•Posted Content

Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization

Farzin Haddadpour, +3 more

- 30 Oct 2019

- arXiv: Learning

TL;DR: This paper shows that for loss functions that satisfy the Polyak-Kojasiewicz condition, rounds of communication suffice to achieve a linear speed up, that is, an error of $O(1/pT)$, where $T$ is the total number of model updates at each worker.

...read moreread less

143

...

Expand

References

•Proceedings Article

Parallelized Stochastic Gradient Descent

Martin Zinkevich, +3 more

- 06 Dec 2010

TL;DR: This paper presents the first parallel stochastic gradient descent algorithm including a detailed analysis and experimental evidence and introduces a novel proof technique — contractive mappings to quantify the speed of convergence of parameter distributions to their asymptotic limits.

...read moreread less

1.5K

•Posted Content

Revisiting Distributed Synchronous SGD

Jianmin Chen, +3 more

- 18 Feb 2016

- arXiv: Learning

TL;DR: It is demonstrated that a third approach, synchronous optimization with backup workers, can avoid asynchronous noise while mitigating for the worst stragglers and is empirically validated and shown to converge faster and to better test accuracies.

...read moreread less

955

•Journal Article•10.1109/TIT.2017.2736066

Speeding Up Distributed Machine Learning Using Codes

Kangwook Lee, +4 more

- 08 Dec 2015

- arXiv: Distributed, Parallel, and Cluste...

TL;DR: In this paper, the authors provide theoretical insights on how coded solutions can achieve significant gains compared to uncoded ones for matrix multiplication and data shuffling in large-scale distributed systems.

...read moreread less

923

•Journal Article•10.1109/TIT.2017.2736066

Speeding Up Distributed Machine Learning Using Codes

Kangwook Lee, +4 more

- 01 Mar 2018

- IEEE Transactions on Information Theory

TL;DR: In this paper, the authors provide theoretical insights on how coded solutions can achieve significant gains compared with uncoded ones for matrix multiplication and data shuffling in large-scale distributed systems.

...read moreread less

798

•Posted Content

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization

Shai Shalev-Shwartz, +1 more

- 10 Sep 2012

- arXiv: Machine Learning

TL;DR: A new analysis of Stochastic Dual Coordinate Ascent (SDCA) is presented showing that this class of methods enjoy strong theoretical guarantees that are comparable or better than SGD.

...read moreread less

724