A modified Adam algorithm for deep neural network optimization

doi:10.1007/s00521-023-08568-z

Open AccessJournal Article10.1007/s00521-023-08568-z

A modified Adam algorithm for deep neural network optimization

M. M. Reyad, +2 more

- 25 Apr 2023

- Neural Computing and Applications

- Vol. 35, pp 17095-17112

134

TL;DR: In this article , a modified version of the Adam Algorithm, HN Adam, was proposed to improve the generalization performance of deep neural networks by adjusting the step size of the parameter updates over the training epochs.

Abstract: Abstract Deep Neural Networks (DNNs) are widely regarded as the most effective learning tool for dealing with large datasets, and they have been successfully used in thousands of applications in a variety of fields. Based on these large datasets, they are trained to learn the relationships between various variables. The adaptive moment estimation (Adam) algorithm, a highly efficient adaptive optimization algorithm, is widely used as a learning algorithm in various fields for training DNN models. However, it needs to improve its generalization performance, especially when training with large-scale datasets. Therefore, in this paper, we propose HN Adam, a modified version of the Adam Algorithm, to improve its accuracy and convergence speed. The HN_Adam algorithm is modified by automatically adjusting the step size of the parameter updates over the training epochs. This automatic adjustment is based on the norm value of the parameter update formula according to the gradient values obtained during the training epochs. Furthermore, a hybrid mechanism was created by combining the standard Adam algorithm and the AMSGrad algorithm. As a result of these changes, the HN_Adam algorithm, like the stochastic gradient descent (SGD) algorithm, has good generalization performance and achieves fast convergence like other adaptive algorithms. To test the proposed HN_Adam algorithm performance, it is evaluated to train a deep convolutional neural network (CNN) model that classifies images using two different standard datasets: MNIST and CIFAR-10. The algorithm results are compared to the basic Adam algorithm and the SGD algorithm, in addition to other five recent SGD adaptive algorithms. In most comparisons, the HN Adam algorithm outperforms the compared algorithms in terms of accuracy and convergence speed. AdaBelief is the most competitive of the compared algorithms. In terms of testing accuracy and convergence speed (represented by the consumed training time), the HN-Adam algorithm outperforms the AdaBelief algorithm by an improvement of 1.0% and 0.29% for the MNIST dataset, and 0.93% and 1.68% for the CIFAR-10 dataset, respectively.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.48550/arxiv.2311.13160

Large Language Models in Education: Vision and Opportunities

Wensheng Gan, +3 more

- 22 Nov 2023

- arXiv.org

TL;DR: This article aims to investigate and summarize the application of LLMs in smart education, and provides guidance and insights for educators, researchers, and policy-makers to gain a deep understanding of the potential and challenges of LLM4Edu.

...read moreread less

39

Journal Article•10.1109/bigdata59044.2023.10386291

Large Language Models in Education: Vision and Opportunities

Wensheng Gan, +3 more

- 15 Dec 2023

TL;DR: Large language models in education offer personalized learning, intelligent tutoring, and educational assessment opportunities, improving the quality of education and learning experience.

...read moreread less

23

Journal Article•10.3934/mbe.2024054

The WuC-Adam algorithm based on joint improvement of Warmup and cosine annealing algorithms.

Can Zhang, +5 more

- 01 Jan 2024

TL;DR: This study introduces WuC-Adam, an enhanced Adam optimization algorithm integrating Warmup and cosine annealing techniques to alleviate local optima, overfitting, and convergence issues, achieving significant improvements in model convergence speed and generalization performance on MNIST, CIFAR10, and CIFAR100 datasets.

...read moreread less

16

Journal Article•10.1109/access.2024.3385099

Attention to Monkeypox: An Interpretable Monkeypox Detection Technique Using Attention Mechanism

Avi Deb Raha, +6 more

- IEEE Access

TL;DR: An attention-based MobileNetV2 model for monkeypox detection, capitalizing on the inherent lightweight design of MobileNetV2 for effective deployment on edge devices, is proposed, and demonstrates impressive results.

...read moreread less

12

Journal Article•10.1038/s41928-024-01280-3

Reconfigurable in-sensor processing based on a multi-phototransistor–one-memristor array

Bingjie Dang, +5 more

- 12 Nov 2024

- Nature electronics

12

...

Expand

References

•Proceedings Article•10.1109/CVPR.2016.90

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

198.7K

•Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

138.5K

•Posted Content

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 10 Dec 2015

- arXiv: Computer Vision and Pattern Recog...

TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.

...read moreread less

117.9K

•Journal Article•10.1145/3065386

ImageNet classification with deep convolutional neural networks

Alex Krizhevsky, +2 more

- 24 May 2017

- Communications of The ACM

TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

...read moreread less

98.2K

Proceedings Article•10.1109/CVPR.2009.5206848

ImageNet: A large-scale hierarchical image database

Jia Deng, +5 more

- 20 Jun 2009

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

...read moreread less

75.9K