Learning rate selection in stochastic gradient methods based on line search strategies
4
TL;DR: In this article , the authors analyse standard and line search based updating rules to fix the learning rate sequence, also in relation to the size of the mini batch chosen to compute the current stochastic gradient.
read more
Abstract: Finite-sum problems appear as the sample average approximation of a stochastic optimization problem and often arise in machine learning applications with large scale data sets. A very popular approach to face finite-sum problems is the stochastic gradient method. It is well known that a proper strategy to select the hyperparameters of this method (i.e. the set of a-priori selected parameters) and, in particular, the learning rate, is needed to guarantee convergence properties and good practical performance. In this paper, we analyse standard and line search based updating rules to fix the learning rate sequence, also in relation to the size of the mini batch chosen to compute the current stochastic gradient. An extensive numerical experimentation is carried out in order to evaluate the effectiveness of the discussed strategies for convex and non-convex finite-sum test problems, highlighting that the line search based methods avoid expensive initial setting of the hyperparameters. The line search based approaches have also been applied to train a Convolutional Neural Network, providing very promising results.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A stochastic gradient method with variance control and variable learning rate for Deep Learning
Giorgia Franchini,Federica Porta,Valeria Ruggiero,Ilaria Trombini,Luca Zanni +4 more
1
A variable metric proximal stochastic gradient method: an application to classification problems
Pasquale Cascarano,Giorgia Franchini,Erich Kobler,Federica Porta,Andrea Sebastiani +4 more
TL;DR: This paper introduces a variable metric proximal stochastic gradient method for supervised classification problems, incorporating automatic sample size selection and non-monotone line search, and provides convergence results for convex and non-convex objectives, outperforming state-of-the-art methods in numerical experiments.
1
A line-search based SGD algorithm with Adaptive Importance Sampling
Filippo Camellini,Serena Crisci,Anna De Magistris,Giorgia Franchini +3 more
Genetic Parameter and Hyper-Parameter Estimation Underlie Nitrogen Use Efficiency in Bread Wheat
Mohammad Bahman Sadeqi,Agim Ballvora,Said Dadshani,Jens Léon +3 more
TL;DR: This study has confirmed the results of bias–variance tradeoff and adaptive prediction error for the ensemble-learning-based model STACK, which has the highest performance when estimating genetic parameters and hyper-parameters in a given GS model compared to other models.
References
A Stochastic Approximation Method
Herbert Robbins,Sutton Monro +1 more
TL;DR: In this article, a method for making successive experiments at levels x1, x2, ··· in such a way that xn will tend to θ in probability is presented.
Optimization Methods for Large-Scale Machine Learning
TL;DR: The authors provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications and discusses how optimization problems arise in machine learning and what makes them challenging.
3.7K
Two-Point Step Size Gradient Methods
TL;DR: Etude de nouvelles methodes de descente suivant le gradient for the solution approchee du probleme de minimisation sans contrainte. as mentioned in this paper.
3K
•Book
An introduction to optimization
Edwin K. P. Chong,Stanislaw H. Żak +1 more
- 01 Jan 2001
TL;DR: An Introduction to Optimization, Second Edition helps students build a solid working knowledge of the field, including unconstrained optimization, linear programming, and constrained optimization.
2.3K
Sample size selection in optimization methods for machine learning
TL;DR: A criterion for increasing the sample size based on variance estimates obtained during the computation of a batch gradient, and establishes an O(1/\epsilon) complexity bound on the total cost of a gradient method.