A differential equation for modeling Nesterov's accelerated gradient method: theory and insights

Open AccessJournal Article

A differential equation for modeling Nesterov's accelerated gradient method: theory and insights

Weijie J. Su, +2 more

- 01 Jan 2016

- Journal of Machine Learning Research

- Vol. 17, Iss: 1, pp 5312-5354

1.1K

TL;DR: A second-order ordinary differential equation is derived, which is the limit of Nesterov's accelerated gradient method, and it is shown that the continuous time ODE allows for a better understanding of Nestersov's scheme.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1016/J.ORL.2017.08.013

Convergence of first-order methods via the convex conjugate

Javier Peña

- 01 Nov 2017

- Operations Research Letters

TL;DR: This paper gives a unified and succinct approach to the convergence rates of the subgradient, gradient, and accelerated gradient methods for unconstrained convex minimization.

...read moreread less

9

•Journal Article•10.1515/ANONA-2020-0143

Inducing strong convergence of trajectories in dynamical systems associated to monotone inclusions with composite structure

Radu Ioan Bot, +3 more

- 25 Aug 2020

- Advances in Nonlinear Analysis

TL;DR: The aim is to design methods which guarantee strong convergence of trajectories towards the minimum norm solution of the underlying monotone inclusion problem, and to investigate in detail the asymptotic behavior of dynamical systems perturbed by a Tikhonov regularization.

...read moreread less

9

•Posted Content

Fixed-time Distributed Optimization under Time-Varying Communication Topology

Kunal Garg, +3 more

- 24 May 2019

- arXiv: Systems and Control

TL;DR: In this paper, a nonlinear protocol for achieving distributed optimization for time-varying communication topology within a fixed time independent of the initial conditions is presented, where each agent in the network can access its private objective function, while exchange of local information is permitted between the neighbors.

...read moreread less

9

•Posted Content

Tikhonov regularization of a second order dynamical system with Hessian driven damping

Radu Ioan Bot, +2 more

- 28 Nov 2019

- arXiv: Optimization and Control

TL;DR: The asymptotic properties of the trajectories generated by a second-order dynamical system with Hessian driven damping and a Tikhonov regularization term are investigated and the derivation of strong convergence results of the trajectory to the minimizer of the objective function of minimum norm is obtained.

...read moreread less

9

Proceedings Article•10.1109/CDC40024.2019.9028936

Hybrid Robust Optimal Resource Allocation with Momentum

Daniel E. Ochoa, +3 more

- 01 Dec 2019

TL;DR: A hybrid regularization is presented that induces the property of uniform asymptotic stability in the system by using the invariance principle for well-posed hybrid dynamical systems and establishing the existence of strictly positive margins of robustness with respect to arbitrarily small disturbances.

...read moreread less

9

...

Expand

References

•Book

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

Stephen Boyd, +4 more

- 23 May 2011

TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.

...read moreread less

20.5K

Journal Article•10.1137/080716542

A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems

Amir Beck, +1 more

- 01 Jan 2009

- Siam Journal on Imaging Sciences

TL;DR: A new fast iterative shrinkage-thresholding algorithm (FISTA) which preserves the computational simplicity of ISTA but with a global rate of convergence which is proven to be significantly better, both theoretically and practically.

...read moreread less

14.3K

•Proceedings Article

On the importance of initialization and momentum in deep learning

Ilya Sutskever, +3 more

- 16 Jun 2013

TL;DR: It is shown that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs to levels of performance that were previously achievable only with Hessian-Free optimization.

...read moreread less

5K

•Book

Proximal Algorithms

Neal Parikh, +1 more

- 27 Nov 2013

TL;DR: The many different interpretations of proximal operators and algorithms are discussed, their connections to many other topics in optimization and applied mathematics are described, some popular algorithms are surveyed, and a large number of examples of proxiesimal operators that commonly arise in practice are provided.

...read moreread less

4.2K

•Journal Article

A Treatise On The Theory Of Bessel Functions Ed.1st

G.n Watson

- 01 Jan 1922

- Birchandra State Central Library,tripura

Abstract: 1. Bessel functions before 1826 2. The Bessel coefficients 3. Bessel functions 4. Differential equations 5. Miscellaneous properties of Bessel functions 6. Integral representations of Bessel functions 7. Asymptotic expansions of Bessel functions 8. Bessel functions of large order 9. Polynomials associated with Bessel functions 10. Functions associated with Bessel functions 11. Addition theorems 12. Definite integrals 13. Infinitive integrals 14. Multiple integrals 15. The zeros of Bessel functions 16. Neumann series and Lommel's functions of two variables 17. Kapteyn series 18. Series of Fourier-Bessel and Dini 19. Schlomlich series 20. The tabulation of Bessel functions Tables of Bessel functions Bibliography Indices.

...read moreread less

4.2K

...

Expand

A differential equation for modeling Nesterov's accelerated gradient method: theory and insights

Chat with Paper

AI Agents for this Paper

Citations

Convergence of first-order methods via the convex conjugate

Inducing strong convergence of trajectories in dynamical systems associated to monotone inclusions with composite structure

Fixed-time Distributed Optimization under Time-Varying Communication Topology

Tikhonov regularization of a second order dynamical system with Hessian driven damping

Hybrid Robust Optimal Resource Allocation with Momentum

References

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems

On the importance of initialization and momentum in deep learning

Proximal Algorithms

A Treatise On The Theory Of Bessel Functions Ed.1st

Related Papers (5)

A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems

Some methods of speeding up the convergence of iteration methods

Introductory Lectures on Convex Optimization: A Basic Course

On the importance of initialization and momentum in deep learning

Smooth minimization of non-smooth functions