Deep Speech: Scaling up end-to-end speech recognition

Open AccessPosted Content

Deep Speech: Scaling up end-to-end speech recognition

- 17 Dec 2014

2.2K

TL;DR: Deep Speech, a state-of-the-art speech recognition system developed using end-to-end deep learning, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.1109/ICASSP.2016.7472641

On training the recurrent neural network encoder-decoder for large vocabulary end-to-end speech recognition

Liang Lu, +2 more

- 20 Mar 2016

TL;DR: This paper presents a more effective stochastic gradient decent (SGD) learning rate schedule that can significantly improve the recognition accuracy, and demonstrates that using multiple recurrent layers in the encoder can reduce the word error rate.

...read moreread less

159

Proceedings Article•10.1109/ICASSP.2017.7952131

A comparison of Deep Learning methods for environmental sound detection

Juncheng Li, +4 more

- 05 Mar 2017

TL;DR: This work presents a comparison of several state-of-the-art Deep Learning models on the IEEE challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 challenge task and data, classifying sounds into one of fifteen common indoor and outdoor acoustic scenes.

...read moreread less

156

•Posted Content

Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems

Guangke Chen, +6 more

- 03 Nov 2019

- arXiv: Audio and Speech Processing

TL;DR: This paper conducts the first comprehensive and systematic study of the adversarial attacks on SR systems (SRSs) to understand their security weakness in the practical black-box setting, and proposes an adversarial attack, named FakeBob, to craft adversarial samples.

...read moreread less

156

•Journal Article•10.1109/TASLP.2016.2621675

Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks

Kun Li, +2 more

- 01 Jan 2017

- IEEE Transactions on Audio, Speech, and ...

TL;DR: An acoustic-graphemic-phonemic model (AGPM) using a multidistribution DNN, whose input features include acoustic features, as well as corresponding graphemes and canonical transcriptions (encoded as binary vectors), which develops a unified MDD framework which works much like free-phone recognition.

...read moreread less

154

•Proceedings Article

Ultra-Low Precision 4-bit Training of Deep Neural Networks

Xiao Sun, +9 more

- 01 Jan 2020

TL;DR: A novel adaptive Gradient Scaling technique (GradScale) is explored that addresses the challenges of insufficient range and resolution in quantized gradients as well as explores the impact of quantization errors observed during model training.

...read moreread less

154

...

Expand

References

•Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

- 03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

88.4K

•Proceedings Article•10.1109/CVPR.2015.7298594

Going deeper with convolutions

Christian Szegedy, +8 more

- 07 Jun 2015

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

56.6K

•Proceedings Article

Sequence to Sequence Learning with Neural Networks

Ilya Sutskever, +2 more

- 08 Dec 2014

TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.

...read moreread less

20.1K

•Proceedings Article

Rectified Linear Units Improve Restricted Boltzmann Machines

Vinod Nair, +1 more

- 21 Jun 2010

TL;DR: Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.

...read moreread less

18.4K

Journal Article•10.1162/NECO.1989.1.4.541

Backpropagation applied to handwritten zip code recognition

Yann LeCun, +6 more

- 01 Dec 1989

- Neural Computation

TL;DR: This paper demonstrates how constraints from the task domain can be integrated into a backpropagation network through the architecture of the network, successfully applied to the recognition of handwritten zip code digits provided by the U.S. Postal Service.

...read moreread less

12.5K

...

Expand

Deep Speech: Scaling up end-to-end speech recognition

Chat with Paper

AI Agents for this Paper

Citations

On training the recurrent neural network encoder-decoder for large vocabulary end-to-end speech recognition

A comparison of Deep Learning methods for environmental sound detection

Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems

Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks

Ultra-Low Precision 4-bit Training of Deep Neural Networks

References

ImageNet Classification with Deep Convolutional Neural Networks

Going deeper with convolutions

Sequence to Sequence Learning with Neural Networks

Rectified Linear Units Improve Restricted Boltzmann Machines

Backpropagation applied to handwritten zip code recognition

Related Papers (5)

ImageNet Classification with Deep Convolutional Neural Networks

Deep Residual Learning for Image Recognition

Long short-term memory

Very Deep Convolutional Networks for Large-Scale Image Recognition

Gradient-based learning applied to document recognition