An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

Open AccessPosted Content

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

- 04 Mar 2018

5.4K

TL;DR: A systematic evaluation of generic convolutional and recurrent architectures for sequence modeling concludes that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutionals should be regarded as a natural starting point for sequence modeled tasks.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.18653/V1/P19-1285

Transformer-XL: Attentive Language Models beyond a Fixed-Length Context.

Zihang Dai, +5 more

- 09 Jan 2019

TL;DR: This work proposes a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence, which consists of a segment-level recurrence mechanism and a novel positional encoding scheme.

...read moreread less

4.2K

•Proceedings Article•10.1109/ICDM.2018.00035

Self-Attentive Sequential Recommendation

Wang-Cheng Kang, +1 more

- 01 Nov 2018

TL;DR: In this article, a self-attention based sequential model (SASRec) is proposed, which uses an attention mechanism to identify which items are'relevant' from a user's action history, and use them to predict the next item.

...read moreread less

2.7K

•Journal Article•10.1029/98JC02160

Ocean Color Chlorophyll Algorithms for SEAWIFS

John E. O'Reilly, +7 more

- 15 Oct 1998

- Journal of Geophysical Research

TL;DR: In this article, a large data set containing coincident in situ chlorophyll and remote sensing reflectance measurements was used to evaluate the accuracy, precision, and suitability of a wide variety of ocean color algorithms for use by SeaWiFS (Sea-viewing Wide Field-of-view Sensor).

...read moreread less

2.6K

•Journal Article•10.1109/TASLP.2019.2915167

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation

Yi Luo, +1 more

- 20 Sep 2018

- arXiv: Sound

TL;DR: A fully convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time- domain speech separation, which significantly outperforms previous time–frequency masking methods in separating two- and three-speaker mixtures.

...read moreread less

2K

•Proceedings Article•10.1109/CVPR.2019.00319

4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks

Christopher Choy, +2 more

- 15 Jun 2019

TL;DR: In this paper, a generalized sparse convolutional neural network (GS-CNN) was proposed for spatio-temporal perception of 3D-videos, which can directly process 3D videos using high-dimensional convolutions.

...read moreread less

1.4K

...

Expand

References

•Proceedings Article

Pointer Sentinel Mixture Models

Stephen Merity, +3 more

- 26 Sep 2016

TL;DR: The pointer sentinel-LSTM model achieves state of the art language modeling performance on the Penn Treebank while using far fewer parameters than a standard softmax LSTM and the freely available WikiText corpus is introduced.

...read moreread less

791

•Proceedings Article

Unitary evolution recurrent neural networks

Martin Arjovsky, +2 more

- 19 Jun 2016

TL;DR: This work constructs an expressive unitary weight matrix by composing several structured matrices that act as building blocks with parameters to be learned, and demonstrates the potential of this architecture by achieving state of the art results in several hard tasks involving very long-term dependencies.

...read moreread less

752

•Proceedings Article

Learning Character-level Representations for Part-of-Speech Tagging

Cicero Nogueira dos Santos, +1 more

- 21 Jun 2014

TL;DR: A deep neural network is proposed that learns character-level representation of words and associate them with usual word representations to perform POS tagging and produces state-of-the-art POS taggers for two languages.

...read moreread less

718

•Proceedings Article

Learning Recurrent Neural Networks with Hessian-Free Optimization

James Martens, +1 more

- 28 Jun 2011

TL;DR: This work solves the long-outstanding problem of how to effectively train recurrent neural networks on complex and difficult sequence modeling problems which may contain long-term data dependencies and offers a new interpretation of the generalized Gauss-Newton matrix of Schraudolph which is used within the HF approach of Martens.

...read moreread less

710

•Posted Content

How to Construct Deep Recurrent Neural Networks

Razvan Pascanu, +3 more

- 20 Dec 2013

- arXiv: Neural and Evolutionary Computing

TL;DR: In this article, the authors explore different ways to extend a recurrent neural network (RNN) to a \textit{deep} RNN by carefully analyzing and understanding the architecture of an RNN.

...read moreread less

708

...

Expand

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

Chat with Paper

AI Agents for this Paper

Citations

Transformer-XL: Attentive Language Models beyond a Fixed-Length Context.

Self-Attentive Sequential Recommendation

Ocean Color Chlorophyll Algorithms for SEAWIFS

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation

4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks

References

Pointer Sentinel Mixture Models

Unitary evolution recurrent neural networks

Learning Character-level Representations for Part-of-Speech Tagging

Learning Recurrent Neural Networks with Hessian-Free Optimization

How to Construct Deep Recurrent Neural Networks

Related Papers (5)

Long short-term memory

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization

Attention is All you Need

Dropout: a simple way to prevent neural networks from overfitting