Stochastic Recursive Gradient Algorithm for Nonconvex Optimization

Open AccessPosted Content

Stochastic Recursive Gradient Algorithm for Nonconvex Optimization

- 20 May 2017

112

TL;DR: This paper studies and analyzes the mini-batch version of StochAstic Recursive grAdient algoritHm (SARAH), a method employing the stochastic recursive gradient, for solving empirical loss minimization for the case of nonconvex losses and provides a sublinear convergence rate and a linear convergence rate for gradient dominated functions.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Posted Content

SCAFFOLD: Stochastic Controlled Averaging for Federated Learning

Sai Praneeth Karimireddy, +5 more

- 14 Oct 2019

- arXiv: Learning

TL;DR: In this article, the authors proposed a new algorithm (SCAFFOLD) which uses control variates (variance reduction) to correct for the ''client-drift'' in its local updates.

...read moreread less

495

•Posted Content

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator

Cong Fang, +3 more

- 04 Jul 2018

- arXiv: Optimization and Control

TL;DR: This paper proposes a new technique named SPIDER, which can be used to track many deterministic quantities of interest with significantly reduced computational cost and proves that SPIDER-SFO nearly matches the algorithmic lower bound for finding approximate first-order stationary points under the gradient Lipschitz assumption in the finite-sum setting.

...read moreread less

363

•Posted Content

SCAFFOLD: Stochastic Controlled Averaging for On-Device Federated Learning.

Sai Praneeth Karimireddy, +5 more

- 14 Oct 2019

TL;DR: A new Stochastic Controlled Averaging algorithm (SCAFFOLD) which uses control variates to reduce the drift between different clients and it is proved that the algorithm requires significantly fewer rounds of communication and benefits from favorable convergence guarantees.

...read moreread less

360

•Proceedings Article

Momentum-Based Variance Reduction in Non-Convex SGD

Ashok Cutkosky, +1 more

- 24 May 2019

TL;DR: A new algorithm is presented, STORM, that does not require any batches and makes use of adaptive learning rates, enabling simpler implementation and less hyperparameter tuning.

...read moreread less

309

•Proceedings Article

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator

Cong Fang, +3 more

- 01 Jan 2018

TL;DR: In this article, a stochastic path-integrated differential estimator (SPIDER) is proposed to track many deterministic quantities of interests with significantly reduced computational cost, and the SPIDER-SFO algorithm achieves a gradient computation cost of

...read moreread less

293

...

Expand

References

Journal Article•10.1109/5.726791

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

- 01 Jan 1998

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

53.5K

Gradient-based learning applied to document recognition

Yann LeCun, +7 more

- 01 Jan 2001

TL;DR: This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task, and Convolutional neural networks are shown to outperform all other techniques.

...read moreread less

32.7K

•Dissertation

Learning Multiple Layers of Features from Tiny Images

Alex Krizhevsky

- 01 Jan 2009

TL;DR: In this paper, the authors describe how to train a multi-layer generative model of natural images, using a dataset of millions of tiny colour images, described in the next section.

...read moreread less

23.7K

•Journal Article•10.1214/AOMS/1177729586

A Stochastic Approximation Method

Herbert Robbins, +1 more

- 01 Sep 1951

- Annals of Mathematical Statistics

TL;DR: In this article, a method for making successive experiments at levels x1, x2, ··· in such a way that xn will tend to θ in probability is presented.

...read moreread less

11.3K

•Journal Article

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

John C. Duchi, +2 more

- 01 Feb 2011

- Journal of Machine Learning Research

TL;DR: This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.

...read moreread less

8.9K

...

Expand

Stochastic Recursive Gradient Algorithm for Nonconvex Optimization

Chat with Paper

AI Agents for this Paper

Citations

SCAFFOLD: Stochastic Controlled Averaging for Federated Learning

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator

SCAFFOLD: Stochastic Controlled Averaging for On-Device Federated Learning.

Momentum-Based Variance Reduction in Non-Convex SGD

SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator

References

Gradient-based learning applied to document recognition

Gradient-based learning applied to document recognition

Learning Multiple Layers of Features from Tiny Images

A Stochastic Approximation Method

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

Related Papers (5)

Accelerating Stochastic Gradient Descent using Predictive Variance Reduction

Introductory Lectures on Convex Optimization: A Basic Course

SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives

A Stochastic Approximation Method

Minimizing finite sums with the stochastic average gradient