Gradient method

Topic Tools

Papers published on a yearly basis

1 / 2

Papers

Journal Article•10.1016/0167-2789(92)90242-F•

Nonlinear total variation based noise removal algorithms

[...]

Leonid I. Rudin, Stanley Osher, Emad Fatemi

01 Nov 1992-Physica D: Nonlinear Phenomena

TL;DR: In this article, a constrained optimization type of numerical algorithm for removing noise from images is presented, where the total variation of the image is minimized subject to constraints involving the statistics of the noise.

...read moreread less

17,333 citations

Journal Article•10.1109/72.279181•

Learning long-term dependencies with gradient descent is difficult

[...]

Yoshua Bengio¹, Patrice Y. Simard², Paolo Frasconi³•Institutions (3)

Université de Montréal¹, AT&T², University of Florence³

01 Mar 1994-IEEE Transactions on Neural Networks

TL;DR: This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods.

...read moreread less

Abstract: Recurrent neural networks can be used to map input sequences to output sequences, such as for recognition, production or prediction problems. However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals. We show why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases. These results expose a trade-off between efficient learning by gradient descent and latching on information for long periods. Based on an understanding of this problem, alternatives to standard gradient descent are considered. >

...read moreread less

9,187 citations

Book Chapter•10.1007/978-3-7908-2604-3_16•

Large-Scale Machine Learning with Stochastic Gradient Descent

[...]

Léon Bottou

1 Jan 2010

TL;DR: A more precise analysis uncovers qualitatively different tradeoffs for the case of small-scale and large-scale learning problems.

...read moreread less

Abstract: During the last decade, the data sizes have grown faster than the speed of processors. In this context, the capabilities of statistical machine learning methods is limited by the computing time rather than the sample size. A more precise analysis uncovers qualitatively different tradeoffs for the case of small-scale and large-scale learning problems. The large-scale case involves the computational complexity of the underlying optimization algorithm in non-trivial ways. Unlikely optimization algorithms such as stochastic gradient descent show amazing performance for large-scale problems. In particular, second order stochastic gradient and averaged stochastic gradient are asymptotically efficient after a single pass on the training set.

...read moreread less

6,901 citations

Journal Article•10.1137/0916069•

A limited memory algorithm for bound constrained optimization

[...]

Richard H. Byrd, Peihuang Lu, Jorge Nocedal, Ciyou Zhu

01 Sep 1995-SIAM Journal on Scientific Computing

TL;DR: An algorithm for solving large nonlinear optimization problems with simple bounds is described, based on the gradient projection method and uses a limited memory BFGS matrix to approximate the Hessian of the objective function.

...read moreread less

Abstract: An algorithm for solving large nonlinear optimization problems with simple bounds is described. It is based on the gradient projection method and uses a limited memory BFGS matrix to approximate the Hessian of the objective function. It is shown how to take advantage of the form of the limited memory approximation to implement the algorithm efficiently. The results of numerical tests on a set of large problems are reported.

...read moreread less

5,938 citations