Open AccessPosted Content
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
TL;DR: A systematic evaluation of generic convolutional and recurrent architectures for sequence modeling concludes that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutionals should be regarded as a natural starting point for sequence modeled tasks.
read more
Abstract: For most deep learning practitioners, sequence modeling is synonymous with recurrent networks. Yet recent results indicate that convolutional architectures can outperform recurrent networks on tasks such as audio synthesis and machine translation. Given a new sequence modeling task or dataset, which architecture should one use? We conduct a systematic evaluation of generic convolutional and recurrent architectures for sequence modeling. The models are evaluated across a broad range of standard tasks that are commonly used to benchmark recurrent networks. Our results indicate that a simple convolutional architecture outperforms canonical recurrent networks such as LSTMs across a diverse range of tasks and datasets, while demonstrating longer effective memory. We conclude that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutional networks should be regarded as a natural starting point for sequence modeling tasks. To assist related work, we have made code available at this http URL .
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Transformer-XL: Attentive Language Models beyond a Fixed-Length Context.
Zihang Dai,Zhilin Yang,Yiming Yang,Jaime G. Carbonell,Quoc V. Le,Ruslan Salakhutdinov +5 more
- 09 Jan 2019
TL;DR: This work proposes a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence, which consists of a segment-level recurrence mechanism and a novel positional encoding scheme.
Self-Attentive Sequential Recommendation
Wang-Cheng Kang,Julian McAuley +1 more
- 01 Nov 2018
TL;DR: In this article, a self-attention based sequential model (SASRec) is proposed, which uses an attention mechanism to identify which items are'relevant' from a user's action history, and use them to predict the next item.
2.7K
Ocean Color Chlorophyll Algorithms for SEAWIFS
John E. O'Reilly,Stéphane Maritorena,B. Greg Mitchell,David A. Siegel,Kendall L. Carder,Sara A. Garver,Mati Kahru,Charles R. McClain +7 more
TL;DR: In this article, a large data set containing coincident in situ chlorophyll and remote sensing reflectance measurements was used to evaluate the accuracy, precision, and suitability of a wide variety of ocean color algorithms for use by SeaWiFS (Sea-viewing Wide Field-of-view Sensor).
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation
Yi Luo,Nima Mesgarani +1 more
TL;DR: A fully convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time- domain speech separation, which significantly outperforms previous time–frequency masking methods in separating two- and three-speaker mixtures.
4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks
Christopher Choy,JunYoung Gwak,Silvio Savarese +2 more
- 15 Jun 2019
TL;DR: In this paper, a generalized sparse convolutional neural network (GS-CNN) was proposed for spatio-temporal perception of 3D-videos, which can directly process 3D videos using high-dimensional convolutions.
References
•Posted Content
Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks
Tim Salimans,Diederik P. Kingma +1 more
TL;DR: Weight normalization as mentioned in this paper reparameterizes the weight vectors in a neural network that decouples the length of those weight vectors from their direction, improving the conditioning of the optimization problem and speed up convergence of stochastic gradient descent.
1.3K
Very deep convolutional networks for text classification
Alexis Conneau,Holger Schwenk,Loïc Barrault,Yann LeCun +3 more
- 03 Apr 2017
TL;DR: Very deep convolutional networks (VDCNN) as mentioned in this paper have been applied to text classification. And they have achieved state-of-the-art performance on several public text classification tasks.
•Posted Content
Comparative Study of CNN and RNN for Natural Language Processing
TL;DR: This work is the first systematic comparison of CNN and RNN on a wide range of representative NLP tasks, aiming to give basic guidance for DNN selection.
1.1K
•Posted Content
Temporal Convolutional Networks for Action Segmentation and Detection
TL;DR: Temporal Convolutional Networks (TCNs) as mentioned in this paper use a hierarchy of temporal convolutions to perform fine-grained action segmentation or detection, which can capture action compositions, segment durations, and long-range dependencies.
•Posted Content
Regularizing and Optimizing LSTM Language Models
TL;DR: This paper proposes the weight-dropped LSTM which uses DropConnect on hidden-to-hidden weights as a form of recurrent regularization and introduces NT-ASGD, a variant of the averaged stochastic gradient method, wherein the averaging trigger is determined using a non-monotonic condition as opposed to being tuned by the user.
1K
Related Papers (5)
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016
Diederik P. Kingma,Jimmy Ba +1 more
- 01 Jan 2015