Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

Open AccessPosted Content

Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

- 30 May 2019

274

TL;DR: The expected $0$-$1$ loss of a wide enough ReLU network trained with stochastic gradient descent and random initialization can be bounded by the training Loss of a random feature model induced by the network gradient at initialization, which is called a neural tangent random feature (NTRF) model.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Posted Content

When can Wasserstein GANs minimize Wasserstein Distance

Yuanzhi Li, +1 more

- 09 Mar 2020

TL;DR: It is shown that when the generator is a class of two-layer neural networks, then it is necessary and sufficient for the discriminator to be a one-layer network with ReLU-type activation functions, and when the training stops, the generator will indeed output a distribution that is inverse-polynomially close to the target.

...read moreread less

8

Journal Article•10.48550/arXiv.2205.14309

Federated Neural Bandit

Zhongxiang Dai, +5 more

- arXiv.org

TL;DR: The federated neural-upper conﬁdence bound (FN-UCB) algorithm is introduced, which adopts a weighted combination of two UCBs: UCB a allows every agent to additionally use the observations from the other agents to accelerate exploration (without sharing raw observations); UCB b uses an NN with aggregated parameters for reward prediction in a similar way as federated averaging for supervised learning.

...read moreread less

7

•Journal Article•10.1007/s10994-022-06192-x

Stabilize deep ResNet with a sharp scaling factor τ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau$$\end{document

Huishuai Zhang, +4 more

- 01 Aug 2022

- Machine Learning

TL;DR: In this paper , the authors study the stability and convergence of training deep ResNets with gradient descent, and they show that the parametric branch in the residual block should be scaled down by a factor of O(1/L) to guarantee stable forward/backward process, where L is the number of residual blocks.

...read moreread less

7

•Proceedings Article•10.1145/3485447.3512250

Learning Neural Ranking Models Online from Implicit User Feedback

Yiling Jia, +1 more

- 17 Jan 2022

TL;DR: This work proposes to directly learn a neural ranking model from users’ implicit feedback, focusing on RankNet and LambdaRank, and proves that under standard assumptions the OL2R solution achieves a gap-dependent upper regret bound of O(log 2(T), in which the regret is defined on the total number of mis-ordered pairs over T rounds.

...read moreread less

7

•Posted Content

Early-stopped neural networks are consistent

Ziwei Ji, +2 more

- 10 Jun 2021

- arXiv: Learning

TL;DR: In this article, the behavior of neural networks trained with the logistic loss via gradient descent on binary classification data where the underlying data distribution is general, and the (optimal) Bayes risk is not necessarily zero.

...read moreread less

7

...

Expand

References

•Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

- 03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

88.4K

Journal Article•10.1109/5.726791

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

- 01 Jan 1998

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

53.5K

Gradient-based learning applied to document recognition

Yann LeCun, +7 more

- 01 Jan 2001

TL;DR: This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task, and Convolutional neural networks are shown to outperform all other techniques.

...read moreread less

32.7K

•Proceedings Article•10.1109/ICCV.2015.123

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Kaiming He, +3 more

- 07 Dec 2015

TL;DR: In this paper, a Parametric Rectified Linear Unit (PReLU) was proposed to improve model fitting with nearly zero extra computational cost and little overfitting risk, which achieved a 4.94% top-5 test error on ImageNet 2012 classification dataset.

...read moreread less

18.2K