Simon S. Du
University of Washington
169 Papers
1.1K Citations
Simon S. Du is an academic researcher from University of Washington. The author has contributed to research in topics: Computer science & Gradient descent. The author has an hindex of 37, co-authored 134 publications. Previous affiliations of Simon S. Du include Carnegie Mellon University.
Chat about Author
Papers
•Posted Content
Gradient Descent Provably Optimizes Over-parameterized Neural Networks
TL;DR: This article showed that gradient descent converges at a global linear rate to the global optimum for two-layer fully connected ReLU activated neural networks, where over-parameterization and random initialization jointly restrict weight vector to be close to its initialization for all iterations.
970
•Proceedings Article
On Exact Computation with an Infinitely Wide Neural Net
Sanjeev Arora,Simon S. Du,Wei Hu,Zhiyuan Li,Ruslan Salakhutdinov,Ruosong Wang +5 more
- 26 Apr 2019
TL;DR: The current paper gives the first efficient exact algorithm for computing the extension of NTK to convolutional neural nets, which it is called Convolutional NTK (CNTK), as well as an efficient GPU implementation of this algorithm.
•Posted Content
Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks
TL;DR: This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: a tighter characterization of training speed, an explanation for why training a neuralNet with random labels leads to slower training, and a data-dependent complexity measure.
700
•Posted Content
On Exact Computation with an Infinitely Wide Neural Net
TL;DR: In this paper, the authors presented an efficient exact algorithm for computing the extension of NTK to convolutional neural nets, which they call Convolutional NTK (CNTK), as well as an efficient GPU implementation of this algorithm.
271
Understanding the acceleration phenomenon via high-resolution differential equations
TL;DR: An alternative limiting process that yields high-resolution ODEs permit a general Lyapunov function framework for the analysis of convergence in both continuous and discrete time and are more accurate surrogates for the underlying algorithms.