Journal Article10.1109/72.279181
Learning long-term dependencies with gradient descent is difficult
TL;DR: This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods.
read more
Abstract: Recurrent neural networks can be used to map input sequences to output sequences, such as for recognition, production or prediction problems. However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals. We show why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases. These results expose a trade-off between efficient learning by gradient descent and latching on information for long periods. Based on an understanding of this problem, alternatives to standard gradient descent are considered. >
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Evaluation of statistical and machine learning models for time series prediction: identifying the state-of-the-art and the best conditions for the use of each model
TL;DR: One of the most extensive, impartial and comprehensible experimental evaluations ever done in the time series prediction field is presented, showing that SARIMA is the only statistical method able to outperform, but without a statistical difference, the following machine learning algorithms: ANN, SVM, and kNN-TSPI.
263
Automated Depression Detection Using Deep Representation and Sequence Learning with EEG Signals
Betul Ay,Ozal Yildirim,Muhammed Talo,Ulas Baran Baloglu,Galip Aydin,Subha D. Puthankattil,U. Rajendra Acharya,U. Rajendra Acharya,U. Rajendra Acharya +8 more
TL;DR: A deep hybrid model developed using convolutional neural network (CNN) and long-short term memory (LSTM) architectures to detect depression using EEG signals is proposed and can be employed in psychiatry wards of the hospitals to detect the depression using EEG signals accurately and thus aid the psychiatrists.
263
•Proceedings Article
Full-capacity unitary recurrent neural networks
Scott Wisdom,Thomas Powers,John R. Hershey,Jonathan Le Roux,Les Atlas +4 more
- 05 Dec 2016
TL;DR: This work provides a theoretical argument to determine if a unitary parameterization has restricted capacity, and shows how a complete, full-capacity unitary recurrence matrix can be optimized over the differentiable manifold of unitary matrices.
An Experimental Review on Deep Learning Architectures for Time Series Forecasting.
TL;DR: Among all studied models, the results show that long short-term memory and convolutional networks are the best alternatives, with LSTMs obtaining the most accurate forecasts and CNNs achieving comparable performance with less variability of results under different parameter configurations, while also being more efficient.
261
•Posted Content
Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition
Xiangang Li,Xihong Wu +1 more
TL;DR: Alternative deep LSTM architectures are proposed and empirically evaluated on a large vocabulary conversational telephone speech recognition task and Experimental results demonstrate that the deep L STM networks benefit from the depth and yield the state-of-the-art performance on this task.
References
Optimization by Simulated Annealing
TL;DR: There is a deep and useful connection between statistical mechanics and multivariate or combinatorial optimization (finding the minimum of a given function depending on many parameters), and a detailed analogy with annealing in solids provides a framework for optimization of very large and complex systems.
46.9K
Learning internal representations by error propagation
David E. Rumelhart,Geoffrey E. Hinton,Ronald J. Williams +2 more
- 01 Jan 1988
TL;DR: This chapter contains sections titled: The Problem, The Generalized Delta Rule, Simulation Results, Some Further Generalizations, Conclusion.
•Book
Learning internal representations by error propagation
David E. Rumelhart,Geoffrey E. Hinton,Ronald J. Williams +2 more
- 03 Jan 1986
TL;DR: In this paper, the problem of the generalized delta rule is discussed and the Generalized Delta Rule is applied to the simulation results of simulation results in terms of the generalized delta rule.
16K
A learning algorithm for continually running fully recurrent neural networks
Ronald J. Williams,David Zipser +1 more
TL;DR: The exact form of a gradient-following learning algorithm for completely recurrent networks running in continually sampled time is derived and used as the basis for practical algorithms for temporal supervised learning tasks.
5K
Minimizing multimodal functions of continuous variables with the “simulated annealing” algorithm—Corrigenda for this article is available here
TL;DR: A new global optimization algorithm for functions of continuous variables is presented, derived from the “Simulated Annealing” algorithm recently introduced in combinatorial optimization, which is quite costly in terms of function evaluations, but its cost can be predicted in advance, depending only slightly on the starting point.
Related Papers (5)
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016
[...]
Diederik P. Kingma,Jimmy Ba +1 more
- 01 Jan 2015