Proceedings Article10.1109/ALLERTON.2012.6483403
Consensus-based distributed optimization: Practical issues and applications in large-scale machine learning
Konstantinos I. Tsianos,Sean Lawlor,Michael G. Rabbat +2 more
- 01 Oct 2012
- pp 1543-1550
TL;DR: The experiments illustrate the benefits of using asynchronous consensus-based distributed optimization when some nodes are unreliable and may fail or when messages experience time-varying delays.
read more
Abstract: This paper discusses practical consensus-based distributed optimization algorithms. In consensus-based optimization algorithms, nodes interleave local gradient descent steps with consensus iterations. Gradient steps drive the solution to a minimizer, while the consensus iterations synchronize the values so that all nodes converge to a network-wide optimum when the objective is convex and separable. The consensus update requires communication. If communication is synchronous and nodes wait to receive one message from each of their neighbors before updating then progress is limited by the slowest node. To be robust to failing or stalling nodes, asynchronous communications should be used. Asynchronous protocols using bi-directional communications cause deadlock, and so one-directional protocols are necessary. However, with one-directional asynchronous protocols it is no longer possible to guarantee the consensus matrix is doubly stochastic. At the same time it is essential that the coordination protocol achieve consensus on the average to avoid biasing the optimization objective. We report on experiments running Push-Sum Distributed Dual Averaging for convex optimization in a MPI cluster. The experiments illustrate the benefits of using asynchronous consensus-based distributed optimization when some nodes are unreliable and may fail or when messages experience time-varying delays.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Distributed optimization over time-varying directed graphs
Angelia Nedic,Alex Olshevsky +1 more
- 01 Jan 2013
TL;DR: This work develops a broadcast-based algorithm, termed the subgradient-push, which steers every node to an optimal value under a standard assumption of subgradient boundedness, which converges at a rate of O (ln t/√t), where the constant depends on the initial values at the nodes, the sub gradient norms, and, more interestingly, on both the consensus speed and the imbalances of influence among the nodes.
1.3K
Network Topology and Communication-Computation Tradeoffs in Decentralized Optimization
Angelia Nedic,Alex Olshevsky,Michael G. Rabbat +2 more
- 17 Apr 2018
TL;DR: This paper presents an overview of recent work in decentralized optimization and surveys the state-of-theart algorithms and their analyses tailored to these different scenarios, highlighting the role of the network topology.
On the Convergence of Decentralized Gradient Descent
Kun Yuan,Qing Ling,Wotao Yin +2 more
TL;DR: Nic and Ozdaglar as mentioned in this paper proposed a decentralized gradient descent method, in which each agent updates its local variable by combining the average of its neighbors' with a local negative-gradient step.
636
Stochastic Gradient-Push for Strongly Convex Functions on Time-Varying Directed Graphs
Angelia Nedic,Alex Olshevsky +1 more
TL;DR: In this article, the authors investigated the convergence rate of the subgradient-push algorithm for strongly convex functions with Lipschitz gradients and showed that it converges in O((ln t)/t) time when only stochastic gradient samples are available.
393
•Posted Content
Distributed optimization over time-varying directed graphs
Angelia Nedic,Alex Olshevsky +1 more
TL;DR: This work develops a broadcast-based algorithm, termed the subgradient-push, which steers every node to an optimal value under a standard assumption of subgradient boundedness, which converges at a rate of O (ln t/√t), where the constant depends on the initial values at the nodes, the sub gradient norms, and, more interestingly, on both the consensus speed and the imbalances of influence among the nodes.
315
References
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
- 06 Dec 2004
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
•Book
Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers
Stephen Boyd,Neal Parikh,Eric Chu,Borja Peleato,Jonathan Eckstein +4 more
- 23 May 2011
TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Consensus and Cooperation in Networked Multi-Agent Systems
Reza Olfati-Saber,J.A. Fax,Richard M. Murray +2 more
- 05 Mar 2007
TL;DR: A theoretical framework for analysis of consensus algorithms for multi-agent networked systems with an emphasis on the role of directed information flow, robustness to changes in network topology due to link/node failures, time-delays, and performance guarantees is provided.
•Book
Parallel and Distributed Computation: Numerical Methods
Dimitri P. Bertsekas,John N. Tsitsiklis +1 more
- 01 Jan 1989
TL;DR: This work discusses parallel and distributed architectures, complexity measures, and communication and synchronization issues, and it presents both Jacobi and Gauss-Seidel iterations, which serve as algorithms of reference for many of the computational approaches addressed later.
7K