An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent

doi:10.48550/arxiv.2310.11291

Journal Article10.48550/arxiv.2310.11291

An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent

Zhao Song, +1 more

- 17 Oct 2023

- arXiv.org

- Vol. abs/2310.11291

1

TL;DR: This research investigates the convergence behavior of the delta-bar-delta algorithm in real-world neural network optimization and proposes a novel approach called RDBD (Regrettable Delta-Bar-Delta), which allows for prompt correction of biased learning rate adjustments and ensures the convergence of the optimization process.

Abstract: The delta-bar-delta algorithm is recognized as a learning rate adaptation technique that enhances the convergence speed of the training process in optimization by dynamically scheduling the learning rate based on the difference between the current and previous weight updates. While this algorithm has demonstrated strong competitiveness in full data optimization when compared to other state-of-the-art algorithms like Adam and SGD, it may encounter convergence issues in mini-batch optimization scenarios due to the presence of noisy gradients. In this study, we thoroughly investigate the convergence behavior of the delta-bar-delta algorithm in real-world neural network optimization. To address any potential convergence challenges, we propose a novel approach called RDBD (Regrettable Delta-Bar-Delta). Our approach allows for prompt correction of biased learning rate adjustments and ensures the convergence of the optimization process. Furthermore, we demonstrate that RDBD can be seamlessly integrated with any optimization algorithm and significantly improve the convergence speed. By conducting extensive experiments and evaluations, we validate the effectiveness and efficiency of our proposed RDBD approach. The results showcase its capability to overcome convergence issues in mini-batch optimization and its potential to enhance the convergence speed of various optimization algorithms. This research contributes to the advancement of optimization techniques in neural network training, providing practitioners with a reliable automatic learning rate scheduler for achieving faster convergence and improved optimization outcomes.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Figures

Figure 3: Comparison of the DBD algorithm and the RDBD algorithm on the Cifar-10 dataset and the MNIST dataset.

Figure 1: A 2D simulation of RDBD algorithm optimization in step t.

Figure 4: Comparison of loss of different initial learning rate.

Figure 2: Comparison of Adam, Adam+RDBD, SGD, RDBD algorithms on MNIST dataset and Cifar-10 dataset.

Figure 5: Comparison of loss of different batch sizes.

Citations

Journal Article•10.3390/a17070272

An Improved Adam’s Algorithm for Stomach Image Classification

Haijing Sun, +6 more

- 21 Jun 2024

- Algorithms

TL;DR: An improved Adam's algorithm for stomach image classification achieves high accuracy by alleviating local optimal solutions, overfitting, and slow convergence rates through a control restart strategy and gradient norm joint clipping technique.

...read moreread less

2

References

•Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

138.5K

•Journal Article•10.1145/3065386

ImageNet classification with deep convolutional neural networks

Alex Krizhevsky, +2 more

- 24 May 2017

- Communications of The ACM

TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

...read moreread less

98.2K

•Posted Content

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, +20 more

- 03 Dec 2019

- arXiv: Learning

TL;DR: PyTorch as discussed by the authors is a machine learning library that provides an imperative and Pythonic programming style that makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs.

...read moreread less

25.9K

•Proceedings Article

Rectified Linear Units Improve Restricted Boltzmann Machines

Vinod Nair, +1 more

- 21 Jun 2010

TL;DR: Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.

...read moreread less

18.4K

•Posted Content

Decoupled Weight Decay Regularization

Ilya Loshchilov, +1 more

- 14 Nov 2017

- arXiv: Learning

TL;DR: This work proposes a simple modification to recover the original formulation of weight decay regularization by decoupling the weight decay from the optimization steps taken w.r.t. the loss function, and provides empirical evidence that this modification substantially improves Adam's generalization performance.

...read moreread less

14.4K

...

Expand