Open AccessPosted Content
Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks
Yuan Cao,Quanquan Gu +1 more
TL;DR: The expected $0$-$1$ loss of a wide enough ReLU network trained with stochastic gradient descent and random initialization can be bounded by the training Loss of a random feature model induced by the network gradient at initialization, which is called a neural tangent random feature (NTRF) model.
read more
Abstract: We study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points. We show that, the expected $0$-$1$ loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of a random feature model induced by the network gradient at initialization, which we call a neural tangent random feature (NTRF) model. For data distributions that can be classified by NTRF model with sufficiently small error, our result yields a generalization error bound in the order of $\tilde{\mathcal{O}}(n^{-1/2})$ that is independent of the network width. Our result is more general and sharper than many existing generalization error bounds for over-parameterized neural networks. In addition, we establish a strong connection between our generalization error bound and the neural tangent kernel (NTK) proposed in recent work.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Training-Free Neural Active Learning with Initialization-Robustness Guarantees
TL;DR: Li et al. as mentioned in this paper introduced the expected variance with Gaussian processes (EV-GP) criterion for neural active learning, which is theoretically guaranteed to select data points which lead to trained NNs with both good predictive performances and initialization robustness.
1
Wide neural networks: From non-gaussian random fields at initialization to the NTK geometry of training
TL;DR: In this article , the authors studied the behavior of neural networks with large, but finite, parameters and showed that the training of sufficiently wide neural networks converges to a global minimum of the corresponding quadratic loss function.
1
Neural Constrained Combinatorial Bandits
Shangshang Wang,Simeng Bian,Xin Liu,Ziyu Shao +3 more
- 17 May 2023
TL;DR: A primal-dual algorithm (Neural-PD) whose primal component adopts multi-layer perceptrons to estimate reward and cost functions, and its dual component estimates the Lagrange multiplier with the virtual queue is proposed by integrating neural tangent kernel theory and Lyapunov-drift techniques.
1
Informed Learning by Wide Neural Networks: Convergence, Generalization and Sampling Complexity
Jianyi Yang,Shaolei Ren +1 more
- 02 Jul 2022
TL;DR: This work considers an informed deep neural network with over-parameterization and domain knowledge integrated into its training objective function, and quantitatively demonstrates the two benefits of domain knowledge in informed learning and proposes a generalized informed training objective to better exploit the bene⬁t of knowledge and balance the label and knowledge imperfectness.
1
Regularized Q-Learning with Linear Function Approximation
Jiachen Xi,Alfredo Garcia,Petar Momčilović +2 more
TL;DR: This paper considers a single-loop algorithm for minimizing the projected Bellman error with finite time convergence guarantees in the case of linear function approximation and provides a performance guarantee for the policies derived from the proposed algorithm.
References
•Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton +2 more
- 03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Gradient-based learning applied to document recognition
Yann LeCun,Léon Bottou,Léon Bottou,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio,Patrick Haffner +6 more
- 01 Jan 1998
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
53.5K
Gradient-based learning applied to document recognition
Yann LeCun,Léon Bottou,Léon Bottou,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio,Patrick Haffner,Patrick Haffner +7 more
- 01 Jan 2001
TL;DR: This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task, and Convolutional neural networks are shown to outperform all other techniques.
32.7K
Mastering the game of Go with deep neural networks and tree search
David Silver,Aja Huang,Chris J. Maddison,Arthur Guez,Laurent Sifre,George van den Driessche,Julian Schrittwieser,Ioannis Antonoglou,Veda Panneershelvam,Marc Lanctot,Sander Dieleman,Dominik Grewe,John Nham,Nal Kalchbrenner,Ilya Sutskever,Timothy P. Lillicrap,Madeleine Leach,Koray Kavukcuoglu,Thore Graepel,Demis Hassabis +19 more
TL;DR: Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 07 Dec 2015
TL;DR: In this paper, a Parametric Rectified Linear Unit (PReLU) was proposed to improve model fitting with nearly zero extra computational cost and little overfitting risk, which achieved a 4.94% top-5 test error on ImageNet 2012 classification dataset.