Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

Open AccessPosted Content

Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

- 30 May 2019

274

TL;DR: The expected $0$-$1$ loss of a wide enough ReLU network trained with stochastic gradient descent and random initialization can be bounded by the training Loss of a random feature model induced by the network gradient at initialization, which is called a neural tangent random feature (NTRF) model.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Posted Content

Obtaining Adjustable Regularization for Free via Iterate Averaging

Jingfeng Wu, +2 more

- 15 Aug 2020

- arXiv: Learning

TL;DR: An averaging scheme is established that provably converts the iterates of SGD on an arbitrary strongly convex and smooth objective function to its regularized counterpart with an adjustable regularization parameter and shows that the same methods work empirically on more general optimization objectives including neural networks.

...read moreread less

1

Journal Article•10.48550/arxiv.2310.07497

Sample-Driven Federated Learning for Energy-Efficient and Real-Time IoT Sensing

Minh Ngoc Luu, +6 more

- 11 Oct 2023

- arXiv.org

TL;DR: This work first formulate an optimization problem that harnesses the sampling process to concurrently reduce overfitting while maximizing accuracy, and introduces an online reinforcement learning algorithm, named Sample-driven Control for Federated Learning (SCFL) built on the Soft Actor-Critic (A2C) framework.

...read moreread less

1

Journal Article•10.48550/arxiv.2310.11291

An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent

Zhao Song, +1 more

- 17 Oct 2023

- arXiv.org

TL;DR: This research investigates the convergence behavior of the delta-bar-delta algorithm in real-world neural network optimization and proposes a novel approach called RDBD (Regrettable Delta-Bar-Delta), which allows for prompt correction of biased learning rate adjustments and ensures the convergence of the optimization process.

...read moreread less

1

Sharper analysis of sparsely activated wide neural networks with trainable biases

Hongru Yang, +3 more

TL;DR: In this paper , the bias initialization is trained to be a constant rather than zero and the network generalization bound after training is provided to ensure the network after sparsification can achieve as fast convergence as the original network.

...read moreread less

1

Journal Article•10.48550/arXiv.2208.05363

Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium

Chris Junchi Li, +3 more

- 10 Aug 2022

- arXiv.org

TL;DR: A novel online learning algorithm is proposed that is able to attain an O ( √ T ) regret with polynomial computational complexity, under very mild assumptions on the reward function and the underlying dynamic of the Markov Games.

...read moreread less

1

...

Expand

References

•Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

- 03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

88.4K

Journal Article•10.1109/5.726791

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

- 01 Jan 1998

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

53.5K

Gradient-based learning applied to document recognition

Yann LeCun, +7 more

- 01 Jan 2001

TL;DR: This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task, and Convolutional neural networks are shown to outperform all other techniques.

...read moreread less

32.7K

•Proceedings Article•10.1109/ICCV.2015.123

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Kaiming He, +3 more

- 07 Dec 2015

TL;DR: In this paper, a Parametric Rectified Linear Unit (PReLU) was proposed to improve model fitting with nearly zero extra computational cost and little overfitting risk, which achieved a 4.94% top-5 test error on ImageNet 2012 classification dataset.

...read moreread less

18.2K