Stable Recurrent Models.

Open AccessProceedings Article

Stable Recurrent Models.

- 27 Sep 2018

37

TL;DR: Theoretically, stable recurrent neural networks are well approximated by feed-forward networks for the purpose of both inference and training by gradient descent and it is demonstrated stable recurrent models often perform as well as their unstable counterparts on benchmark sequence tasks.

Abstract: Stability is a fundamental property of dynamical systems, yet to this date it has had little bearing on the practice of recurrent neural networks. In this work, we conduct a thorough investigation of stable recurrent models. Theoretically, we prove stable recurrent neural networks are well approximated by feed-forward networks for the purpose of both inference and training by gradient descent. Empirically, we demonstrate stable recurrent models often perform as well as their unstable counterparts on benchmark sequence tasks. Taken together, these findings shed light on the effective power of recurrent networks and suggest much of sequence learning happens, or can be made to happen, in the stable regime. Moreover, our results help to explain why in many cases practitioners succeed in replacing recurrent models by feed-forward models.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.1109/CVPR.2019.00576

Learning 3D Human Dynamics From Video

Angjoo Kanazawa, +3 more

- 15 Jun 2019

TL;DR: In this paper, a semi-supervised approach is proposed to learn a representation of 3D dynamics of humans from video via a simple but effective temporal encoding of image features. But the model is designed so it can learn from videos with 2D pose annotations.

...read moreread less

548

•Journal Article•10.1162/TACL_A_00306

Theoretical Limitations of Self-Attention in Neural Sequence Models

Michael Hahn

- 16 Jun 2019

- arXiv: Computation and Language

TL;DR: Across both soft and hard attention, strong theoretical limitations are shown of the computational abilities of self-attention, finding that it cannot model periodic finite-state languages, nor hierarchical structure, unless the number of layers or heads increases with input length.

...read moreread less

202

•Posted Content

Training Spiking Neural Networks Using Lessons From Deep Learning.

Jason K. Eshraghian, +8 more

- 27 Sep 2021

- arXiv: Neural and Evolutionary Computing

TL;DR: In this article, the authors apply the lessons learnt from several decades of research in deep learning, gradient descent, backpropagation and neuroscience to biologically plausible spiking neural neural networks.

...read moreread less

198

•Posted Content

On the Iteration Complexity of Hypergradient Computation

Riccardo Grazzi, +3 more

- 29 Jun 2020

- arXiv: Machine Learning

TL;DR: A unified analysis is presented which allows for the first time to quantitatively compare these methods, providing explicit bounds for their iteration complexity, and suggests a hierarchy in terms of computational efficiency among the above methods.

...read moreread less

129

•Proceedings Article•10.1109/ICCV.2019.00721

Predicting 3D Human Dynamics From Video

Jason Y. Zhang, +3 more

- 01 Oct 2019

TL;DR: Zhang et al. as mentioned in this paper proposed an approach for predicting future 3D mesh model sequence of a person from past video input, which has a plethora of practical applications in autonomous systems that must operate safely around people from visual inputs.

...read moreread less

112

...

Expand

Stable Recurrent Models.

Chat with Paper

AI Agents for this Paper

Citations

Learning 3D Human Dynamics From Video

Theoretical Limitations of Self-Attention in Neural Sequence Models

Training Spiking Neural Networks Using Lessons From Deep Learning.

On the Iteration Complexity of Hypergradient Computation

Predicting 3D Human Dynamics From Video

Related Papers (5)

Long short-term memory

Learning long-term dependencies with gradient descent is difficult

On the difficulty of training recurrent neural networks

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

Finding Structure in Time