Open AccessPosted Content
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
TL;DR: A systematic evaluation of generic convolutional and recurrent architectures for sequence modeling concludes that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutionals should be regarded as a natural starting point for sequence modeled tasks.
read more
Abstract: For most deep learning practitioners, sequence modeling is synonymous with recurrent networks. Yet recent results indicate that convolutional architectures can outperform recurrent networks on tasks such as audio synthesis and machine translation. Given a new sequence modeling task or dataset, which architecture should one use? We conduct a systematic evaluation of generic convolutional and recurrent architectures for sequence modeling. The models are evaluated across a broad range of standard tasks that are commonly used to benchmark recurrent networks. Our results indicate that a simple convolutional architecture outperforms canonical recurrent networks such as LSTMs across a diverse range of tasks and datasets, while demonstrating longer effective memory. We conclude that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutional networks should be regarded as a natural starting point for sequence modeling tasks. To assist related work, we have made code available at this http URL .
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Transformer-XL: Attentive Language Models beyond a Fixed-Length Context.
Zihang Dai,Zhilin Yang,Yiming Yang,Jaime G. Carbonell,Quoc V. Le,Ruslan Salakhutdinov +5 more
- 09 Jan 2019
TL;DR: This work proposes a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence, which consists of a segment-level recurrence mechanism and a novel positional encoding scheme.
Self-Attentive Sequential Recommendation
Wang-Cheng Kang,Julian McAuley +1 more
- 01 Nov 2018
TL;DR: In this article, a self-attention based sequential model (SASRec) is proposed, which uses an attention mechanism to identify which items are'relevant' from a user's action history, and use them to predict the next item.
2.7K
Ocean Color Chlorophyll Algorithms for SEAWIFS
John E. O'Reilly,Stéphane Maritorena,B. Greg Mitchell,David A. Siegel,Kendall L. Carder,Sara A. Garver,Mati Kahru,Charles R. McClain +7 more
TL;DR: In this article, a large data set containing coincident in situ chlorophyll and remote sensing reflectance measurements was used to evaluate the accuracy, precision, and suitability of a wide variety of ocean color algorithms for use by SeaWiFS (Sea-viewing Wide Field-of-view Sensor).
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation
Yi Luo,Nima Mesgarani +1 more
TL;DR: A fully convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time- domain speech separation, which significantly outperforms previous time–frequency masking methods in separating two- and three-speaker mixtures.
4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks
Christopher Choy,JunYoung Gwak,Silvio Savarese +2 more
- 15 Jun 2019
TL;DR: In this paper, a generalized sparse convolutional neural network (GS-CNN) was proposed for spatio-temporal perception of 3D-videos, which can directly process 3D videos using high-dimensional convolutions.
References
•Proceedings Article
Pointer Sentinel Mixture Models
Stephen Merity,Caiming Xiong,James Bradbury,Richard Socher +3 more
- 26 Sep 2016
TL;DR: The pointer sentinel-LSTM model achieves state of the art language modeling performance on the Penn Treebank while using far fewer parameters than a standard softmax LSTM and the freely available WikiText corpus is introduced.
•Proceedings Article
Unitary evolution recurrent neural networks
Martin Arjovsky,Amar Shah,Yoshua Bengio +2 more
- 19 Jun 2016
TL;DR: This work constructs an expressive unitary weight matrix by composing several structured matrices that act as building blocks with parameters to be learned, and demonstrates the potential of this architecture by achieving state of the art results in several hard tasks involving very long-term dependencies.
•Proceedings Article
Learning Character-level Representations for Part-of-Speech Tagging
Cicero Nogueira dos Santos,Bianca Zadrozny +1 more
- 21 Jun 2014
TL;DR: A deep neural network is proposed that learns character-level representation of words and associate them with usual word representations to perform POS tagging and produces state-of-the-art POS taggers for two languages.
•Proceedings Article
Learning Recurrent Neural Networks with Hessian-Free Optimization
James Martens,Ilya Sutskever +1 more
- 28 Jun 2011
TL;DR: This work solves the long-outstanding problem of how to effectively train recurrent neural networks on complex and difficult sequence modeling problems which may contain long-term data dependencies and offers a new interpretation of the generalized Gauss-Newton matrix of Schraudolph which is used within the HF approach of Martens.
•Posted Content
How to Construct Deep Recurrent Neural Networks
TL;DR: In this article, the authors explore different ways to extend a recurrent neural network (RNN) to a \textit{deep} RNN by carefully analyzing and understanding the architecture of an RNN.
708
Related Papers (5)
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016
Diederik P. Kingma,Jimmy Ba +1 more
- 01 Jan 2015