Open AccessPosted Content
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
TL;DR: A systematic evaluation of generic convolutional and recurrent architectures for sequence modeling concludes that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutionals should be regarded as a natural starting point for sequence modeled tasks.
read more
Abstract: For most deep learning practitioners, sequence modeling is synonymous with recurrent networks. Yet recent results indicate that convolutional architectures can outperform recurrent networks on tasks such as audio synthesis and machine translation. Given a new sequence modeling task or dataset, which architecture should one use? We conduct a systematic evaluation of generic convolutional and recurrent architectures for sequence modeling. The models are evaluated across a broad range of standard tasks that are commonly used to benchmark recurrent networks. Our results indicate that a simple convolutional architecture outperforms canonical recurrent networks such as LSTMs across a diverse range of tasks and datasets, while demonstrating longer effective memory. We conclude that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutional networks should be regarded as a natural starting point for sequence modeling tasks. To assist related work, we have made code available at this http URL .
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Transformer-XL: Attentive Language Models beyond a Fixed-Length Context.
Zihang Dai,Zhilin Yang,Yiming Yang,Jaime G. Carbonell,Quoc V. Le,Ruslan Salakhutdinov +5 more
- 09 Jan 2019
TL;DR: This work proposes a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence, which consists of a segment-level recurrence mechanism and a novel positional encoding scheme.
Self-Attentive Sequential Recommendation
Wang-Cheng Kang,Julian McAuley +1 more
- 01 Nov 2018
TL;DR: In this article, a self-attention based sequential model (SASRec) is proposed, which uses an attention mechanism to identify which items are'relevant' from a user's action history, and use them to predict the next item.
2.7K
Ocean Color Chlorophyll Algorithms for SEAWIFS
John E. O'Reilly,Stéphane Maritorena,B. Greg Mitchell,David A. Siegel,Kendall L. Carder,Sara A. Garver,Mati Kahru,Charles R. McClain +7 more
TL;DR: In this article, a large data set containing coincident in situ chlorophyll and remote sensing reflectance measurements was used to evaluate the accuracy, precision, and suitability of a wide variety of ocean color algorithms for use by SeaWiFS (Sea-viewing Wide Field-of-view Sensor).
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation
Yi Luo,Nima Mesgarani +1 more
TL;DR: A fully convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time- domain speech separation, which significantly outperforms previous time–frequency masking methods in separating two- and three-speaker mixtures.
4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks
Christopher Choy,JunYoung Gwak,Silvio Savarese +2 more
- 15 Jun 2019
TL;DR: In this paper, a generalized sparse convolutional neural network (GS-CNN) was proposed for spatio-temporal perception of 3D-videos, which can directly process 3D videos using high-dimensional convolutions.
References
Effective Use of Word Order for Text Categorization with Convolutional Neural Networks
Rie Johnson,Tong Zhang +1 more
- 01 Jan 2015
TL;DR: A straightforward adaptation of CNN from image to text, a simple but new variation which employs bag-of-word conversion in the convolution layer is proposed and an extension to combine multiple convolution layers is explored for higher accuracy.
•Proceedings Article
How to Construct Deep Recurrent Neural Networks
Razvan Pascanu,Caglar Gulcehre,Kyunghyun Cho,Yoshua Bengio +3 more
- 01 Jan 2014
TL;DR: In this article, the authors explore different ways to extend a recurrent neural network (RNN) to a \textit{deep} RNN by carefully analyzing and understanding the architecture of an RNN.
944
Using the Output Embedding to Improve Language Models
Ofir Press,Lior Wolf +1 more
- 01 Apr 2017
TL;DR: This article showed that weight tying can reduce the size of neural translation models to less than half of their original size without harming their performance and proposed a new method of regularizing the output embedding.
Deep pyramid convolutional neural networks for text categorization
Rie Johnson,Tong Zhang +1 more
- 01 Jul 2017
TL;DR: A low-complexity word-level deep convolutional neural network architecture for text categorization that can efficiently represent long-range associations in text and outperforms the previous best models on six benchmark datasets for sentiment classification and topic categorization.
•Posted Content
A Simple Way to Initialize Recurrent Networks of Rectified Linear Units
TL;DR: This paper proposes a simpler solution that use recurrent neural networks composed of rectified linear units that is comparable to LSTM on four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.
Related Papers (5)
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016
Diederik P. Kingma,Jimmy Ba +1 more
- 01 Jan 2015