Open AccessJournal Article
A differential equation for modeling Nesterov's accelerated gradient method: theory and insights
TL;DR: A second-order ordinary differential equation is derived, which is the limit of Nesterov's accelerated gradient method, and it is shown that the continuous time ODE allows for a better understanding of Nestersov's scheme.
read more
Abstract: We derive a second-order ordinary differential equation (ODE) which is the limit of Nesterov's accelerated gradient method. This ODE exhibits approximate equivalence to Nesterov's scheme and thus can serve as a tool for analysis. We show that the continuous time ODE allows for a better understanding of Nesterov's scheme. As a byproduct, we obtain a family of schemes with similar convergence rates. The ODE interpretation also suggests restarting Nesterov's scheme leading to an algorithm, which can be rigorously proven to converge at a linear rate whenever the objective is strongly convex.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Convergence of first-order methods via the convex conjugate
TL;DR: This paper gives a unified and succinct approach to the convergence rates of the subgradient, gradient, and accelerated gradient methods for unconstrained convex minimization.
9
Inducing strong convergence of trajectories in dynamical systems associated to monotone inclusions with composite structure
TL;DR: The aim is to design methods which guarantee strong convergence of trajectories towards the minimum norm solution of the underlying monotone inclusion problem, and to investigate in detail the asymptotic behavior of dynamical systems perturbed by a Tikhonov regularization.
9
•Posted Content
Fixed-time Distributed Optimization under Time-Varying Communication Topology
TL;DR: In this paper, a nonlinear protocol for achieving distributed optimization for time-varying communication topology within a fixed time independent of the initial conditions is presented, where each agent in the network can access its private objective function, while exchange of local information is permitted between the neighbors.
9
•Posted Content
Tikhonov regularization of a second order dynamical system with Hessian driven damping
TL;DR: The asymptotic properties of the trajectories generated by a second-order dynamical system with Hessian driven damping and a Tikhonov regularization term are investigated and the derivation of strong convergence results of the trajectory to the minimizer of the objective function of minimum norm is obtained.
9
Hybrid Robust Optimal Resource Allocation with Momentum
Daniel E. Ochoa,Jorge I. Poveda,César A. Uribe,Nicanor Quijano +3 more
- 01 Dec 2019
TL;DR: A hybrid regularization is presented that induces the property of uniform asymptotic stability in the system by using the invariance principle for well-posed hybrid dynamical systems and establishing the existence of strictly positive margins of robustness with respect to arbitrarily small disturbances.
9
References
•Book
Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers
Stephen Boyd,Neal Parikh,Eric Chu,Borja Peleato,Jonathan Eckstein +4 more
- 23 May 2011
TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.
A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems
Amir Beck,Marc Teboulle +1 more
TL;DR: A new fast iterative shrinkage-thresholding algorithm (FISTA) which preserves the computational simplicity of ISTA but with a global rate of convergence which is proven to be significantly better, both theoretically and practically.
14.3K
•Proceedings Article
On the importance of initialization and momentum in deep learning
Ilya Sutskever,James Martens,George E. Dahl,Geoffrey E. Hinton +3 more
- 16 Jun 2013
TL;DR: It is shown that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs to levels of performance that were previously achievable only with Hessian-Free optimization.
5K
•Book
Proximal Algorithms
Neal Parikh,Stephen Boyd +1 more
- 27 Nov 2013
TL;DR: The many different interpretations of proximal operators and algorithms are discussed, their connections to many other topics in optimization and applied mathematics are described, some popular algorithms are surveyed, and a large number of examples of proxiesimal operators that commonly arise in practice are provided.
4.2K
•Journal Article
A Treatise On The Theory Of Bessel Functions Ed.1st
Abstract: 1. Bessel functions before 1826 2. The Bessel coefficients 3. Bessel functions 4. Differential equations 5. Miscellaneous properties of Bessel functions 6. Integral representations of Bessel functions 7. Asymptotic expansions of Bessel functions 8. Bessel functions of large order 9. Polynomials associated with Bessel functions 10. Functions associated with Bessel functions 11. Addition theorems 12. Definite integrals 13. Infinitive integrals 14. Multiple integrals 15. The zeros of Bessel functions 16. Neumann series and Lommel's functions of two variables 17. Kapteyn series 18. Series of Fourier-Bessel and Dini 19. Schlomlich series 20. The tabulation of Bessel functions Tables of Bessel functions Bibliography Indices.
4.2K