Open AccessPosted Content
Stochastic Recursive Gradient Algorithm for Nonconvex Optimization
TL;DR: This paper studies and analyzes the mini-batch version of StochAstic Recursive grAdient algoritHm (SARAH), a method employing the stochastic recursive gradient, for solving empirical loss minimization for the case of nonconvex losses and provides a sublinear convergence rate and a linear convergence rate for gradient dominated functions.
read more
Abstract: In this paper, we study and analyze the mini-batch version of StochAstic Recursive grAdient algoritHm (SARAH), a method employing the stochastic recursive gradient, for solving empirical loss minimization for the case of nonconvex losses. We provide a sublinear convergence rate (to stationary points) for general nonconvex functions and a linear convergence rate for gradient dominated functions, both of which have some advantages compared to other modern stochastic gradient algorithms for nonconvex losses.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Posted Content
SCAFFOLD: Stochastic Controlled Averaging for Federated Learning
Sai Praneeth Karimireddy,Satyen Kale,Mehryar Mohri,Sashank J. Reddi,Sebastian U. Stich,Ananda Theertha Suresh +5 more
TL;DR: In this article, the authors proposed a new algorithm (SCAFFOLD) which uses control variates (variance reduction) to correct for the ''client-drift'' in its local updates.
495
•Posted Content
SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path Integrated Differential Estimator
TL;DR: This paper proposes a new technique named SPIDER, which can be used to track many deterministic quantities of interest with significantly reduced computational cost and proves that SPIDER-SFO nearly matches the algorithmic lower bound for finding approximate first-order stationary points under the gradient Lipschitz assumption in the finite-sum setting.
•Posted Content
SCAFFOLD: Stochastic Controlled Averaging for On-Device Federated Learning.
Sai Praneeth Karimireddy,Satyen Kale,Mehryar Mohri,Sashank J. Reddi,Sebastian U. Stich,Ananda Theertha Suresh +5 more
- 14 Oct 2019
TL;DR: A new Stochastic Controlled Averaging algorithm (SCAFFOLD) which uses control variates to reduce the drift between different clients and it is proved that the algorithm requires significantly fewer rounds of communication and benefits from favorable convergence guarantees.
360
•Proceedings Article
Momentum-Based Variance Reduction in Non-Convex SGD
Ashok Cutkosky,Francesco Orabona +1 more
- 24 May 2019
TL;DR: A new algorithm is presented, STORM, that does not require any batches and makes use of adaptive learning rates, enabling simpler implementation and less hyperparameter tuning.
•Proceedings Article
SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator
Cong Fang,Chris Li,Zhouchen Lin,Tong Zhang +3 more
- 01 Jan 2018
TL;DR: In this article, a stochastic path-integrated differential estimator (SPIDER) is proposed to track many deterministic quantities of interests with significantly reduced computational cost, and the SPIDER-SFO algorithm achieves a gradient computation cost of
References
Gradient-based learning applied to document recognition
Yann LeCun,Léon Bottou,Léon Bottou,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio,Patrick Haffner +6 more
- 01 Jan 1998
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
53.5K
Gradient-based learning applied to document recognition
Yann LeCun,Léon Bottou,Léon Bottou,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio,Patrick Haffner,Patrick Haffner +7 more
- 01 Jan 2001
TL;DR: This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task, and Convolutional neural networks are shown to outperform all other techniques.
32.7K
•Dissertation
Learning Multiple Layers of Features from Tiny Images
Alex Krizhevsky
- 01 Jan 2009
TL;DR: In this paper, the authors describe how to train a multi-layer generative model of natural images, using a dataset of millions of tiny colour images, described in the next section.
A Stochastic Approximation Method
Herbert Robbins,Sutton Monro +1 more
TL;DR: In this article, a method for making successive experiments at levels x1, x2, ··· in such a way that xn will tend to θ in probability is presented.
•Journal Article
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
TL;DR: This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.