Dheeraj Nagaraj
Massachusetts Institute of Technology
44 Papers
154 Citations
Dheeraj Nagaraj is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Computer science & Stochastic gradient descent. The author has an hindex of 9, co-authored 24 publications.
Chat about Author
Papers
•Proceedings Article
SGD without Replacement: Sharper Rates for General Smooth Convex Functions
Dheeraj Nagaraj,Prateek Jain,Praneeth Netrapalli +2 more
- 24 May 2019
TL;DR: In this article, the authors show that sampling without replacement leads to coupling between iterates and gradients for smooth convex functions and provide nonasymptotic results for stochastic gradient descent when applied to general smooth, strongly-convex functions.
Making the Last Iterate of SGD Information Theoretically Optimal
TL;DR: The main contribution of this work is to design new step size sequences that enjoy information theoretically optimal bounds on the suboptimality ofSGD as well as GD, by designing a modification scheme that converts one sequence of step sizes to another so that the last point of SGD/GD with modified sequence has the same sub optimality guarantees as the average of SGd/GDwith original sequence.
Phase transitions for detecting latent geometry in random graphs
TL;DR: It is proved that the random intersection graph converges in total variation to G(n, p) when d = \tilde{\omega}(n^3) and does not if $d = o(n*3)$, resolving an open problem in Fill et al. (2018).
48
•Posted Content
Least Squares Regression with Markovian Data: Fundamental Limits and Algorithms
TL;DR: In this paper, the authors study the problem of least square linear regression where the data-points are dependent and are sampled from a Markov chain and establish sharp information theoretic minimax lower bounds for this problem.
•Posted Content
Making the Last Iterate of SGD Information Theoretically Optimal
TL;DR: In this paper, the authors proposed a modification scheme that converts one sequence of step sizes to another so that the last point of SGD/GD with modified sequence has the same suboptimality guarantees as the average of the original sequence.
39