On Lazy Training in Differentiable Programming.

Open AccessPosted Content

On Lazy Training in Differentiable Programming.

- 19 Dec 2018

219

TL;DR: In this article, the authors show that the lazy training phenomenon is not specific to over-parameterized neural networks, and is due to a choice of scaling, often implicit, that makes the model behave as its linearization around the initialization, thus yielding a model equivalent to learning with positive-definite kernels.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1088/1742-5468/ABC62B

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent

Jaehoon Lee, +6 more

- 18 Feb 2019

- arXiv: Machine Learning

TL;DR: In this article, the authors show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters.

...read moreread less

1K

•Posted Content

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

Sanjeev Arora, +4 more

- 24 Jan 2019

- arXiv: Learning

TL;DR: This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: a tighter characterization of training speed, an explanation for why training a neuralNet with random labels leads to slower training, and a data-dependent complexity measure.

...read moreread less

700

•Posted Content

Surprises in High-Dimensional Ridgeless Least Squares Interpolation.

Trevor Hastie, +3 more

- 19 Mar 2019

- arXiv: Statistics Theory

TL;DR: This paper recovers---in a precise quantitative way---several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.

...read moreread less

670

•Posted Content

Underspecification Presents Challenges for Credibility in Modern Machine Learning

Alexander D'Amour, +39 more

- 06 Nov 2020

- arXiv: Learning

TL;DR: This work shows the need to explicitly account for underspecification in modeling pipelines that are intended for real-world deployment in any domain, and shows that this problem appears in a wide variety of practical ML pipelines.

...read moreread less

659

5分で分かる!? 有名論文ナナメ読み：Jacot, Arthor, Gabriel, Franck and Hongler, Clement : Neural Tangent Kernel : Convergence and Generalization in Neural Networks

甘利俊一

- 15 Apr 2020

474

...

Expand

References

•Proceedings Article•10.1109/CVPR.2016.90

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

- 27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

198.7K

•Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

- 04 Sep 2014

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

102.6K

•Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

- 01 Jan 2015

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

51.9K

•Dissertation

Learning Multiple Layers of Features from Tiny Images

Alex Krizhevsky

- 01 Jan 2009

TL;DR: In this paper, the authors describe how to train a multi-layer generative model of natural images, using a dataset of millions of tiny colour images, described in the next section.

...read moreread less

23.7K

•Proceedings Article•10.1109/ICCV.2015.123

Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Kaiming He, +3 more

- 07 Dec 2015

TL;DR: In this paper, a Parametric Rectified Linear Unit (PReLU) was proposed to improve model fitting with nearly zero extra computational cost and little overfitting risk, which achieved a 4.94% top-5 test error on ImageNet 2012 classification dataset.

...read moreread less

18.2K