Open AccessPosted Content
Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks
Yuan Cao,Quanquan Gu +1 more
TL;DR: The expected $0$-$1$ loss of a wide enough ReLU network trained with stochastic gradient descent and random initialization can be bounded by the training Loss of a random feature model induced by the network gradient at initialization, which is called a neural tangent random feature (NTRF) model.
read more
Abstract: We study the training and generalization of deep neural networks (DNNs) in the over-parameterized regime, where the network width (i.e., number of hidden nodes per layer) is much larger than the number of training data points. We show that, the expected $0$-$1$ loss of a wide enough ReLU network trained with stochastic gradient descent (SGD) and random initialization can be bounded by the training loss of a random feature model induced by the network gradient at initialization, which we call a neural tangent random feature (NTRF) model. For data distributions that can be classified by NTRF model with sufficiently small error, our result yields a generalization error bound in the order of $\tilde{\mathcal{O}}(n^{-1/2})$ that is independent of the network width. Our result is more general and sharper than many existing generalization error bounds for over-parameterized neural networks. In addition, we establish a strong connection between our generalization error bound and the neural tangent kernel (NTK) proposed in recent work.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Posted Content
Obtaining Adjustable Regularization for Free via Iterate Averaging
TL;DR: An averaging scheme is established that provably converts the iterates of SGD on an arbitrary strongly convex and smooth objective function to its regularized counterpart with an adjustable regularization parameter and shows that the same methods work empirically on more general optimization objectives including neural networks.
1
Sample-Driven Federated Learning for Energy-Efficient and Real-Time IoT Sensing
Minh Ngoc Luu,Minh-Duong Nguyen,Ebrahim Bedeer,Van Duc Nguyen,Dinh Thai Hoang,Diep N. Nguyen,Quoc-Viet Pham +6 more
TL;DR: This work first formulate an optimization problem that harnesses the sampling process to concurrently reduce overfitting while maximizing accuracy, and introduces an online reinforcement learning algorithm, named Sample-driven Control for Federated Learning (SCFL) built on the Soft Actor-Critic (A2C) framework.
An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent
TL;DR: This research investigates the convergence behavior of the delta-bar-delta algorithm in real-world neural network optimization and proposes a novel approach called RDBD (Regrettable Delta-Bar-Delta), which allows for prompt correction of biased learning rate adjustments and ensures the convergence of the optimization process.
Sharper analysis of sparsely activated wide neural networks with trainable biases
TL;DR: In this paper , the bias initialization is trained to be a constant rather than zero and the network generalization bound after training is provided to ensure the network after sparsification can achieve as fast convergence as the original network.
1
Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium
TL;DR: A novel online learning algorithm is proposed that is able to attain an O ( √ T ) regret with polynomial computational complexity, under very mild assumptions on the reward function and the underlying dynamic of the Markov Games.
References
•Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton +2 more
- 03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Gradient-based learning applied to document recognition
Yann LeCun,Léon Bottou,Léon Bottou,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio,Patrick Haffner +6 more
- 01 Jan 1998
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
53.5K
Gradient-based learning applied to document recognition
Yann LeCun,Léon Bottou,Léon Bottou,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio,Patrick Haffner,Patrick Haffner +7 more
- 01 Jan 2001
TL;DR: This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task, and Convolutional neural networks are shown to outperform all other techniques.
32.7K
Mastering the game of Go with deep neural networks and tree search
David Silver,Aja Huang,Chris J. Maddison,Arthur Guez,Laurent Sifre,George van den Driessche,Julian Schrittwieser,Ioannis Antonoglou,Veda Panneershelvam,Marc Lanctot,Sander Dieleman,Dominik Grewe,John Nham,Nal Kalchbrenner,Ilya Sutskever,Timothy P. Lillicrap,Madeleine Leach,Koray Kavukcuoglu,Thore Graepel,Demis Hassabis +19 more
TL;DR: Using this search algorithm, the program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0.5, the first time that a computer program has defeated a human professional player in the full-sized game of Go.
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 07 Dec 2015
TL;DR: In this paper, a Parametric Rectified Linear Unit (PReLU) was proposed to improve model fitting with nearly zero extra computational cost and little overfitting risk, which achieved a 4.94% top-5 test error on ImageNet 2012 classification dataset.