Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

Open AccessPosted Content

Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks

- 30 May 2019

274

TL;DR: The expected $0$-$1$ loss of a wide enough ReLU network trained with stochastic gradient descent and random initialization can be bounded by the training Loss of a random feature model induced by the network gradient at initialization, which is called a neural tangent random feature (NTRF) model.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.48550/arXiv.2306.04454

Training-Free Neural Active Learning with Initialization-Robustness Guarantees

Zhongxiang Dai, +2 more

- 07 Jun 2023

- arXiv.org

TL;DR: Li et al. as mentioned in this paper introduced the expected variance with Gaussian processes (EV-GP) criterion for neural active learning, which is theoretically guaranteed to select data points which lead to trained NNs with both good predictive performances and initialization robustness.

...read moreread less

1

Journal Article•10.48550/arXiv.2304.03385

Wide neural networks: From non-gaussian random fields at initialization to the NTK geometry of training

Luís Alfredo V. de Carvalho, +3 more

- 06 Apr 2023

- arXiv.org

TL;DR: In this article , the authors studied the behavior of neural networks with large, but finite, parameters and showed that the training of sufficiently wide neural networks converges to a global minimum of the corresponding quadratic loss function.

...read moreread less

1

Journal Article•10.1109/infocom53939.2023.10228958

Neural Constrained Combinatorial Bandits

Shangshang Wang, +3 more

- 17 May 2023

TL;DR: A primal-dual algorithm (Neural-PD) whose primal component adopts multi-layer perceptrons to estimate reward and cost functions, and its dual component estimates the Lagrange multiplier with the virtual queue is proposed by integrating neural tangent kernel theory and Lyapunov-drift techniques.

...read moreread less

1

Proceedings Article•10.48550/arXiv.2207.00751

Informed Learning by Wide Neural Networks: Convergence, Generalization and Sampling Complexity

Jianyi Yang, +1 more

- 02 Jul 2022

TL;DR: This work considers an informed deep neural network with over-parameterization and domain knowledge integrated into its training objective function, and quantitatively demonstrates the two beneﬁts of domain knowledge in informed learning and proposes a generalized informed training objective to better exploit the bene⬁t of knowledge and balance the label and knowledge imperfectness.

...read moreread less

1

Journal Article•10.48550/arxiv.2401.15196

Regularized Q-Learning with Linear Function Approximation

Jiachen Xi, +2 more

- 26 Jan 2024

- arXiv.org

TL;DR: This paper considers a single-loop algorithm for minimizing the projected Bellman error with finite time convergence guarantees in the case of linear function approximation and provides a performance guarantee for the policies derived from the proposed algorithm.

...read moreread less

1

...

Expand

References

•Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

- 03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

88.4K

Journal Article•10.1109/5.726791

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

- 01 Jan 1998

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

53.5K

Gradient-based learning applied to document recognition

Yann LeCun, +7 more

- 01 Jan 2001

TL;DR: This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task, and Convolutional neural networks are shown to outperform all other techniques.

...read moreread less

32.7K

•Proceedings Article•10.1109/ICCV.2015.123

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Kaiming He, +3 more

- 07 Dec 2015

TL;DR: In this paper, a Parametric Rectified Linear Unit (PReLU) was proposed to improve model fitting with nearly zero extra computational cost and little overfitting risk, which achieved a 4.94% top-5 test error on ImageNet 2012 classification dataset.

...read moreread less

18.2K