A primer on neural network models for natural language processing

doi:10.1613/JAIR.4992

Open AccessJournal Article10.1613/JAIR.4992

A primer on neural network models for natural language processing

Yoav Goldberg

- 01 Sep 2016

- Journal of Artificial Intelligence Resea...

- Vol. 57, Iss: 1, pp 345-420

1.2K

TL;DR: This tutorial surveys neural network models from the perspective of natural language processing research, in an attempt to bring natural-language researchers up to speed with the neural techniques.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1109/ACCESS.2020.2988550

Sentiment Classification Using a Single-Layered BiLSTM Model

Zabit Hameed, +1 more

- 17 Apr 2020

- IEEE Access

TL;DR: This study presents a computationally efficient deep learning model for binary sentiment classification, which aims to decide the sentiment polarity of people’s opinions, attitudes, and emotions expressed in written text and utilizes merely one bidirectional long short-term memory (BiLSTM) layer.

...read moreread less

210

•Posted Content

All You Need is "Love": Evading Hate-speech Detection

Tommi Gröndahl, +4 more

- 28 Aug 2018

- arXiv: Computation and Language

TL;DR: It is argued that for successful hate speech detection, model architecture is less important than the type of data and labeling criteria, and all proposed detection techniques are brittle against adversaries who can (automatically) insert typos, change word boundaries or add innocuous words to the original hate speech.

...read moreread less

209

•Journal Article•10.1080/19312458.2018.1455817

More than Bags of Words : Sentiment Analysis with Word Embeddings

Elena Rudkowsky, +5 more

- 10 Apr 2018

- Communication Methods and Measures

TL;DR: The use of word embeddings as part of a supervised machine learning procedure which estimates levels of negativity in parliamentary speeches shows the potential of the word embedDings approach for sentiment analysis in the social sciences.

...read moreread less

207

•Proceedings Article•10.18653/V1/E17-1015

Multitask learning for mental health conditions with limited social media data

Adrian Benton, +2 more

- 01 Apr 2017

TL;DR: The framework proposed significantly improves over all baselines and single-task models for predicting mental health conditions, with particularly significant gains for conditions with limited data, and establishes for the first time the potential of deep learning in the prediction of mental health from online user-generated text.

...read moreread less

200

•Posted Content

Neural Machine Translation: A Review

Felix Stahlberg

- 04 Dec 2019

TL;DR: This work traces back the origins of modern NMT architectures to word and sentence embeddings and earlier examples of the encoder-decoder network family and concludes with a survey of recent trends in the field.

...read moreread less

199

...

Expand

References

Journal Article•10.1162/NECO.1997.9.8.1735

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997

- Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

99K

•Journal Article•10.1145/3065386

ImageNet classification with deep convolutional neural networks

Alex Krizhevsky, +2 more

- 24 May 2017

- Communications of The ACM

TL;DR: A large, deep convolutional neural network was trained to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes and employed a recently developed regularization method called "dropout" that proved to be very effective.

...read moreread less

98.2K

•Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

- 03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

88.4K

•Posted Content

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 22 Dec 2014

- arXiv: Learning

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.

...read moreread less

82.5K

Journal Article•10.1109/5.726791

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

- 01 Jan 1998

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

53.5K

...

Expand

A primer on neural network models for natural language processing

Chat with Paper

AI Agents for this Paper

Citations

Sentiment Classification Using a Single-Layered BiLSTM Model

All You Need is "Love": Evading Hate-speech Detection

More than Bags of Words : Sentiment Analysis with Word Embeddings

Multitask learning for mental health conditions with limited social media data

Neural Machine Translation: A Review

References

Long short-term memory

ImageNet classification with deep convolutional neural networks

ImageNet Classification with Deep Convolutional Neural Networks

Adam: A Method for Stochastic Optimization

Gradient-based learning applied to document recognition

Related Papers (5)

Glove: Global Vectors for Word Representation

Long short-term memory

Deep learning

Neural Machine Translation by Jointly Learning to Align and Translate

Dropout: a simple way to prevent neural networks from overfitting