Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

Open AccessPosted Content

Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

- 30 May 2019

274

TL;DR: The expected $0$-$1$ loss of a wide enough ReLU network trained with stochastic gradient descent and random initialization can be bounded by the training Loss of a random feature model induced by the network gradient at initialization, which is called a neural tangent random feature (NTRF) model.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Posted Content

The Interpolation Phase Transition in Neural Networks: Memorization and Generalization under Lazy Training.

Andrea Montanari, +1 more

- 25 Jul 2020

- arXiv: Machine Learning

TL;DR: It is shown that the network approximately performs ridge regression in the raw features, with a strictly positive `self-induced' regularization in the context of two-layers neural networks in the neural tangent (NT) regime.

...read moreread less

Preprint•10.48550/arxiv.2406.01977

What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding

Hongkang Li, +5 more

- 04 Jun 2024

TL;DR: The theoretical investigation of a shallow Graph Transformer for semi-supervised node classification reveals that self-attention and positional encoding enhance generalization by promoting the core neighborhood and making the attention map sparse.

...read moreread less

•Proceedings Article•10.1145/3477495.3532057

Scalable Exploration for Neural Online Learning to Rank with Perturbed Feedback

Yiling Jia, +1 more

- 13 Jun 2022

TL;DR: This work proposes an efficient exploration strategy for online interactive neural ranker learning based on bootstrapping that eliminates explicit confidence set construction and the associated computational overhead, which enables the online neural rankers training to be efficiently executed in practice with theoretical guarantees.

...read moreread less

Theoretical Characterization of How Neural Network Pruning Aﬀects its Generalization

Hongru Yang, +4 more

TL;DR: In this paper , the authors considered a classiﬁcation task for overparameterized two-layer neural networks, where the network is randomly pruned according to different rates at the initialization.

...read moreread less

•Posted Content

Random Features for Kernel Approximation: A Survey on Algorithms, Theory, and Beyond

Fanghui Liu, +3 more

- 23 Apr 2020

- arXiv: Machine Learning

TL;DR: This survey systematically review the work on random features from the past ten years, and discusses the relationship between random features and modern over-parameterized deep neural networks, including the use of random features in the analysis DNNs as well as the gaps between current theoretical and empirical results.

...read moreread less

...

Expand

References

•Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

- 03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

88.4K

Journal Article•10.1109/5.726791

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

- 01 Jan 1998

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

53.5K

Gradient-based learning applied to document recognition

Yann LeCun, +7 more

- 01 Jan 2001

TL;DR: This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task, and Convolutional neural networks are shown to outperform all other techniques.

...read moreread less

32.7K

•Proceedings Article•10.1109/ICCV.2015.123

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Kaiming He, +3 more

- 07 Dec 2015

TL;DR: In this paper, a Parametric Rectified Linear Unit (PReLU) was proposed to improve model fitting with nearly zero extra computational cost and little overfitting risk, which achieved a 4.94% top-5 test error on ImageNet 2012 classification dataset.

...read moreread less

18.2K