Open AccessPosted Content
Deep Speech: Scaling up end-to-end speech recognition
Awni Hannun,Carl Case,Jared Casper,Bryan Catanzaro,Greg Diamos,Erich Elsen,Ryan Prenger,Sanjeev Satheesh,Shubho Sengupta,Adam Coates,Andrew Y. Ng +10 more
TL;DR: Deep Speech, a state-of-the-art speech recognition system developed using end-to-end deep learning, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set.
read more
Abstract: We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learns a function that is robust to such effects. We do not need a phoneme dictionary, nor even the concept of a "phoneme." Key to our approach is a well-optimized RNN training system that uses multiple GPUs, as well as a set of novel data synthesis techniques that allow us to efficiently obtain a large amount of varied data for training. Our system, called Deep Speech, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set. Deep Speech also handles challenging noisy environments better than widely used, state-of-the-art commercial speech systems.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Journal Article
Regression Prior Networks
TL;DR: This work extends Prior Networks and EnD$^2$ to regression tasks by considering the Normal-Wishart distribution and demonstrates the properties of Regression Prior Networks, where they yield performance competitive with ensemble approaches.
35
Multi-channel attention for end-to-end speech recognition
Stefan Braun,Daniel Neil,Jithendar Anumula,Enea Ceolini,Shih-Chii Liu +4 more
- 02 Sep 2018
TL;DR: This work proposes a sensory attention mechanism that is invariant to the channel ordering and only increases the overall parameter count by 0.09%, and demonstrates that even without re-training, this attention-equipped end-to-end model is able to deal with arbitrary numbers of input channels during inference.
Searching toward pareto-optimal device-aware neural architectures
An-Chieh Cheng,Jin-Dong Dong,Chi-Hung Hsu,Shu-Huan Chang,Min Sun,Shih-Chieh Chang,Jia-Yu Pan,Yu-Ting Chen,Wei Wei,Da-Cheng Juan +9 more
- 05 Nov 2018
TL;DR: Experimental results are poised to show that architectures found by MONAS and DPP-Net achieves Pareto optimality w.r.t the given objectives for various devices.
35
•Posted Content
Universal adversarial examples in speech command classification
Jon Vadillo,Roberto Santana +1 more
TL;DR: Evidence is provided that universal attacks can be generated for speech command classification tasks, which are able to generalize across different models to a significant extent and a novel analytical framework is proposed for the evaluation of universal perturbations under different levels of universality.
34
Accelerating Recurrent Neural Networks for Gravitational Wave Experiments
Zhiqiang Que,Erwei Wang,Umar Marikar,Eric A. Moreno,Jennifer Ngadiuba,Hamza Javed,Bartlomiej Borzyszkowski,Thea Klaeboe Aarrestad,Vladimir Loncar,Sioni Summers,Maurizio Pierini,Peter Y. K. Cheung,Wayne Luk +12 more
TL;DR: In this paper, a reconfigurable architecture for reducing the latency of recurrent neural networks (RNNs) that are used for detecting gravitational waves is presented, which is based on optimizing the initiation intervals (II) in a multi-layer LSTM (Long Short-Term Memory) network, by identifying appropriate reuse factors for each layer.
References
•Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky,Ilya Sutskever,Geoffrey E. Hinton +2 more
- 03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Going deeper with convolutions
Christian Szegedy,Wei Liu,Yangqing Jia,Pierre Sermanet,Scott Reed,Dragomir Anguelov,Dumitru Erhan,Vincent Vanhoucke,Andrew Rabinovich +8 more
- 07 Jun 2015
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
•Proceedings Article
Sequence to Sequence Learning with Neural Networks
Ilya Sutskever,Oriol Vinyals,Quoc V. Le +2 more
- 08 Dec 2014
TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.
•Proceedings Article
Rectified Linear Units Improve Restricted Boltzmann Machines
Vinod Nair,Geoffrey E. Hinton +1 more
- 21 Jun 2010
TL;DR: Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.
Backpropagation applied to handwritten zip code recognition
Yann LeCun,Bernhard E. Boser,John S. Denker,D. Henderson,Richard Howard,W. Hubbard,Lawrence D. Jackel +6 more
TL;DR: This paper demonstrates how constraints from the task domain can be integrated into a backpropagation network through the architecture of the network, successfully applied to the recognition of handwritten zip code digits provided by the U.S. Postal Service.
12.5K
Related Papers (5)
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016
Karen Simonyan,Andrew Zisserman +1 more
- 04 Sep 2014