Open AccessPosted Content
Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks
Yuan Cao,Quanquan Gu +1 more
TL;DR: The expected $0$-$1$ loss of a wide enough ReLU network trained with stochastic gradient descent and random initialization can be bounded by the training Loss of a random feature model induced by the network gradient at initialization, which is called a neural tangent random feature (NTRF) model.
read more
Abstract: We study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points. We show that, the expected $0$-$1$ loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of a random feature model induced by the network gradient at initialization, which we call a neural tangent random feature (NTRF) model. For data distributions that can be classified by NTRF model with sufficiently small error, our result yields a generalization error bound in the order of $\tilde{\mathcal{O}}(n^{-1/2})$ that is independent of the network width. Our result is more general and sharper than many existing generalization error bounds for over-parameterized neural networks. In addition, we establish a strong connection between our generalization error bound and the neural tangent kernel (NTK) proposed in recent work.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Posted Content
When can Wasserstein GANs minimize Wasserstein Distance
Yuanzhi Li,Zehao Dou +1 more
- 09 Mar 2020
TL;DR: It is shown that when the generator is a class of two-layer neural networks, then it is necessary and sufficient for the discriminator to be a one-layer network with ReLU-type activation functions, and when the training stops, the generator will indeed output a distribution that is inverse-polynomially close to the target.
8
Federated Neural Bandit
TL;DR: The federated neural-upper confidence bound (FN-UCB) algorithm is introduced, which adopts a weighted combination of two UCBs: UCB a allows every agent to additionally use the observations from the other agents to accelerate exploration (without sharing raw observations); UCB b uses an NN with aggregated parameters for reward prediction in a similar way as federated averaging for supervised learning.
7
Stabilize deep ResNet with a sharp scaling factor τ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau$$\end{document
TL;DR: In this paper , the authors study the stability and convergence of training deep ResNets with gradient descent, and they show that the parametric branch in the residual block should be scaled down by a factor of O(1/L) to guarantee stable forward/backward process, where L is the number of residual blocks.
Learning Neural Ranking Models Online from Implicit User Feedback
Yiling Jia,Hongning Wang +1 more
- 17 Jan 2022
TL;DR: This work proposes to directly learn a neural ranking model from users’ implicit feedback, focusing on RankNet and LambdaRank, and proves that under standard assumptions the OL2R solution achieves a gap-dependent upper regret bound of O(log 2(T), in which the regret is defined on the total number of mis-ordered pairs over T rounds.
•Posted Content
Early-stopped neural networks are consistent
TL;DR: In this article, the behavior of neural networks trained with the logistic loss via gradient descent on binary classification data where the underlying data distribution is general, and the (optimal) Bayes risk is not necessarily zero.
7
References
•Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton +2 more
- 03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Gradient-based learning applied to document recognition
Yann LeCun,Léon Bottou,Léon Bottou,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio,Patrick Haffner +6 more
- 01 Jan 1998
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
53.5K
Gradient-based learning applied to document recognition
Yann LeCun,Léon Bottou,Léon Bottou,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio,Patrick Haffner,Patrick Haffner +7 more
- 01 Jan 2001
TL;DR: This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task, and Convolutional neural networks are shown to outperform all other techniques.
32.7K
Mastering the game of Go with deep neural networks and tree search
David Silver,Aja Huang,Chris J. Maddison,Arthur Guez,Laurent Sifre,George van den Driessche,Julian Schrittwieser,Ioannis Antonoglou,Veda Panneershelvam,Marc Lanctot,Sander Dieleman,Dominik Grewe,John Nham,Nal Kalchbrenner,Ilya Sutskever,Timothy P. Lillicrap,Madeleine Leach,Koray Kavukcuoglu,Thore Graepel,Demis Hassabis +19 more
TL;DR: Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 07 Dec 2015
TL;DR: In this paper, a Parametric Rectified Linear Unit (PReLU) was proposed to improve model fitting with nearly zero extra computational cost and little overfitting risk, which achieved a 4.94% top-5 test error on ImageNet 2012 classification dataset.