Open AccessPosted Content
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
TL;DR: A systematic evaluation of generic convolutional and recurrent architectures for sequence modeling concludes that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutionals should be regarded as a natural starting point for sequence modeled tasks.
read more
Abstract: For most deep learning practitioners, sequence modeling is synonymous with recurrent networks. Yet recent results indicate that convolutional architectures can outperform recurrent networks on tasks such as audio synthesis and machine translation. Given a new sequence modeling task or dataset, which architecture should one use? We conduct a systematic evaluation of generic convolutional and recurrent architectures for sequence modeling. The models are evaluated across a broad range of standard tasks that are commonly used to benchmark recurrent networks. Our results indicate that a simple convolutional architecture outperforms canonical recurrent networks such as LSTMs across a diverse range of tasks and datasets, while demonstrating longer effective memory. We conclude that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutional networks should be regarded as a natural starting point for sequence modeling tasks. To assist related work, we have made code available at this http URL .
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Transformer-XL: Attentive Language Models beyond a Fixed-Length Context.
Zihang Dai,Zhilin Yang,Yiming Yang,Jaime G. Carbonell,Quoc V. Le,Ruslan Salakhutdinov +5 more
- 09 Jan 2019
TL;DR: This work proposes a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence, which consists of a segment-level recurrence mechanism and a novel positional encoding scheme.
Self-Attentive Sequential Recommendation
Wang-Cheng Kang,Julian McAuley +1 more
- 01 Nov 2018
TL;DR: In this article, a self-attention based sequential model (SASRec) is proposed, which uses an attention mechanism to identify which items are'relevant' from a user's action history, and use them to predict the next item.
2.7K
Ocean Color Chlorophyll Algorithms for SEAWIFS
John E. O'Reilly,Stéphane Maritorena,B. Greg Mitchell,David A. Siegel,Kendall L. Carder,Sara A. Garver,Mati Kahru,Charles R. McClain +7 more
TL;DR: In this article, a large data set containing coincident in situ chlorophyll and remote sensing reflectance measurements was used to evaluate the accuracy, precision, and suitability of a wide variety of ocean color algorithms for use by SeaWiFS (Sea-viewing Wide Field-of-view Sensor).
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation
Yi Luo,Nima Mesgarani +1 more
TL;DR: A fully convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time- domain speech separation, which significantly outperforms previous time–frequency masking methods in separating two- and three-speaker mixtures.
4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks
Christopher Choy,JunYoung Gwak,Silvio Savarese +2 more
- 15 Jun 2019
TL;DR: In this paper, a generalized sparse convolutional neural network (GS-CNN) was proposed for spatio-temporal perception of 3D-videos, which can directly process 3D videos using high-dimensional convolutions.
References
•Posted Content
The LAMBADA dataset: Word prediction requiring a broad discourse context
Denis Paperno,Germán Kruszewski,Angeliki Lazaridou,Quan Ngoc Pham,Raffaella Bernardi,Sandro Pezzelle,Marco Baroni,Gemma Boleda,Raquel Fernández +8 more
TL;DR: It is shown that LAMBADA exemplifies a wide range of linguistic phenomena, and that none of several state-of-the-art language models reaches accuracy above 1% on this novel benchmark.
433
•Posted Content
Hierarchical Multiscale Recurrent Neural Networks
TL;DR: In this paper, a hierarchical multiscale recurrent neural network (HM-RNN) is proposed to capture the latent hierarchical structure in the sequence by encoding the temporal dependencies with different timescales using a novel update mechanism.
420
•Proceedings Article
A Clockwork RNN
Jan Koutník,Klaus Greff,Faustino Gomez,Jürgen Schmidhuber +3 more
- 21 Jun 2014
TL;DR: This paper introduces a simple, yet powerful modification to the simple RNN architecture, the Clockwork RNN (CW-RNN), in which the hidden layer is partitioned into separate modules, each processing inputs at its own temporal granularity, making computations only at its prescribed clock rate.
•Proceedings Article
Recurrent Batch Normalization
Tim Cooijmans,Nicolas Ballas,César Laurent,Caglar Gulcehre,Aaron Courville +4 more
- 30 Mar 2016
TL;DR: In this article, a reparameterization of LSTM that brings the benefits of batch normalization to recurrent neural networks is proposed. But the authors only apply batch normalisation to the hidden-to-hidden transformation of RNNs and demonstrate that it is both possible and beneficial to batch-normalize the hidden to hidden transition.
•Posted Content
On the State of the Art of Evaluation in Neural Language Models
TL;DR: This work reevaluate several popular architectures and regularisation methods with large-scale automatic black-box hyperparameter tuning and arrives at the somewhat surprising conclusion that standard LSTM architectures, when properly regularised, outperform more recent models.
294
Related Papers (5)
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016
Diederik P. Kingma,Jimmy Ba +1 more
- 01 Jan 2015