Open AccessPosted Content
Unit Tests for Stochastic Optimization
TL;DR: In this article, a collection of unit tests for stochastic optimization is developed, which evaluate an optimization algorithm on a small-scale, isolated, and well-understood difficulty, rather than in real world scenarios where many such issues are entangled.
read more
Abstract: Optimization by stochastic gradient descent is an important component of many large-scale machine learning algorithms. A wide variety of such optimization algorithms have been devised; however, it is unclear whether these algorithms are robust and widely applicable across many different optimization landscapes. In this paper we develop a collection of unit tests for stochastic optimization. Each unit test rapidly evaluates an optimization algorithm on a small-scale, isolated, and well-understood difficulty, rather than in real-world scenarios where many such issues are entangled. Passing these unit tests is not sufficient, but absolutely necessary for any algorithms with claims to generality or robustness. We give initial quantitative and qualitative results on numerous established algorithms. The testing framework is open-source, extensible, and easy to apply to new algorithms.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Comprehensive survey of deep learning in remote sensing: theories, tools, and challenges for the community
TL;DR: In this article, the authors provide a comprehensive survey of state-of-the-art remote sensing deep learning research for remote sensing applications, focusing on theories, tools, and challenges for the remote sensing community.
705
Spatiotemporal Recurrent Convolutional Networks for Traffic Prediction in Transportation Networks.
TL;DR: Wang et al. as mentioned in this paper proposed a spatiotemporal recurrent convolutional networks (SRCNs) for traffic forecasting, which inherit the advantages of deep CNNs and LSTM neural networks.
559
A Comparative Analysis of Gradient Descent-Based Optimization Algorithms on Convolutional Neural Networks
E. M. Dogo,Oluwatobi Joshua Afolabi,Nnamdi Nwulu,Bhekisipho Twala,Clinton Aigbavboa +4 more
- 01 Dec 2018
TL;DR: The overall experimental results obtained show Nadam achieved better performance across the three datasets in comparison to the other optimization techniques, while AdaDelta performed the worst.
309
•Posted Content
Equilibrated adaptive learning rates for non-convex optimization
TL;DR: A novel adaptive learning rate scheme, called ESGD, based on the equilibration preconditioner is introduced, and experiments show that ESGD performs as well or better than RMSProp in terms of convergence speed, always clearly improving over plain stochastic gradient descent.
•Proceedings Article
Equilibrated adaptive learning rates for non-convex optimization
Yann N. Dauphin,Harm de Vries,Yoshua Bengio +2 more
- 07 Dec 2015
TL;DR: In this article, the authors show that the Jacobi preconditioner has undesirable behavior in the presence of both positive and negative curvature, and present theoretical and empirical evidence that the so-called equilibration pre-conditioner is comparatively better suited to non-convex problems.
References
•Book
Reinforcement Learning: An Introduction
Richard S. Sutton,Andrew G. Barto +1 more
- 01 Jan 1988
TL;DR: This book provides a clear and simple account of the key ideas and algorithms of reinforcement learning, which ranges from the history of the field's intellectual foundations to the most recent developments and applications.
A Stochastic Approximation Method
Herbert Robbins,Sutton Monro +1 more
TL;DR: In this article, a method for making successive experiments at levels x1, x2, ··· in such a way that xn will tend to θ in probability is presented.
•Proceedings Article
Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.
John C. Duchi,Elad Hazan,Yoram Singer +2 more
- 01 Jan 2010
TL;DR: Adaptive subgradient methods as discussed by the authors dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning, which allows us to find needles in haystacks in the form of very predictive but rarely seen features.
8.7K
•Posted Content
Improving neural networks by preventing co-adaptation of feature detectors
TL;DR: The authors randomly omits half of the feature detectors on each training case to prevent complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors.
•Posted Content
ADADELTA: An Adaptive Learning Rate Method
TL;DR: A novel per-dimension learning rate method for gradient descent called ADADELTA that dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent is presented.
7.5K