Deep Speech: Scaling up end-to-end speech recognition

Open AccessPosted Content

Deep Speech: Scaling up end-to-end speech recognition

- 17 Dec 2014

2.2K

TL;DR: Deep Speech, a state-of-the-art speech recognition system developed using end-to-end deep learning, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set.

Abstract: We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learns a function that is robust to such effects. We do not need a phoneme dictionary, nor even the concept of a "phoneme." Key to our approach is a well-optimized RNN training system that uses multiple GPUs, as well as a set of novel data synthesis techniques that allow us to efficiently obtain a large amount of varied data for training. Our system, called Deep Speech, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set. Deep Speech also handles challenging noisy environments better than widely used, state-of-the-art commercial speech systems.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Posted Content

On Language Model Integration for RNN Transducer based Speech Recognition.

Wei Zhou, +3 more

- 13 Oct 2021

- arXiv: Computation and Language

TL;DR: This paper proposed an exact-ILM training framework by extending the proof given in the hybrid autoregressive transducer, which enables a theoretical justification for other ILM approaches, which can further improve the best ILM method.

...read moreread less

13

•Posted Content

Deep Xi as a Front-End for Robust Automatic Speech Recognition

Aaron Nicolson, +1 more

- 18 Jun 2019

- arXiv: Audio and Speech Processing

TL;DR: The experimental investigation of Deep Xi as a frontend for robust ASR shows that Deep Xi is a viable front-end, and is able to significantly increase the robustness of an ASR system.

...read moreread less

13

Proceedings Article•10.1109/ICTAI.2018.00034

Supervised Data Synthesizing and Evolving – A Framework for Real-World Traffic Crash Severity Classification

Yi He, +4 more

- 01 Nov 2018

TL;DR: A novel Supervised Data Synthesizing and Evolving algorithm is proposed, which can properly represent the HILS data into a more balanced and separable form without altering the original data distribution.

...read moreread less

13

•Journal Article•10.1109/TII.2020.2977774

Trustworthy Method for Person Identification in IIoT Environments by Means of Facial Dynamics

Aniello Castiglione, +2 more

- 01 Feb 2021

- IEEE Transactions on Industrial Informat...

TL;DR: The proposed method models these dynamic facial patterns captured from edge Internet of Things devices by means of the Local Binary Pattern on Three Orthogonal Planes descriptor, which effectively extract both face's local features and movement at the fog level of the architecture.

...read moreread less

13

...

Expand

References

•Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

- 03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

88.4K

•Proceedings Article•10.1109/CVPR.2015.7298594

Going deeper with convolutions

Christian Szegedy, +8 more

- 07 Jun 2015

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

56.6K

•Proceedings Article

Sequence to Sequence Learning with Neural Networks

Ilya Sutskever, +2 more

- 08 Dec 2014

TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.

...read moreread less

20.1K

•Proceedings Article

Rectified Linear Units Improve Restricted Boltzmann Machines

Vinod Nair, +1 more

- 21 Jun 2010

TL;DR: Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.

...read moreread less

18.4K

Journal Article•10.1162/NECO.1989.1.4.541

Backpropagation applied to handwritten zip code recognition

Yann LeCun, +6 more

- 01 Dec 1989

- Neural Computation

TL;DR: This paper demonstrates how constraints from the task domain can be integrated into a backpropagation network through the architecture of the network, successfully applied to the recognition of handwritten zip code digits provided by the U.S. Postal Service.

...read moreread less

12.5K

...

Expand

Deep Speech: Scaling up end-to-end speech recognition

Chat with Paper

AI Agents for this Paper

Citations

On Language Model Integration for RNN Transducer based Speech Recognition.

Deep Xi as a Front-End for Robust Automatic Speech Recognition

Supervised Data Synthesizing and Evolving – A Framework for Real-World Traffic Crash Severity Classification

Trustworthy Method for Person Identification in IIoT Environments by Means of Facial Dynamics

Inside Project Brainwave's Cloud-Scale, Real-Time AI Processor

References

ImageNet Classification with Deep Convolutional Neural Networks

Going deeper with convolutions

Sequence to Sequence Learning with Neural Networks

Rectified Linear Units Improve Restricted Boltzmann Machines

Backpropagation applied to handwritten zip code recognition

Related Papers (5)

ImageNet Classification with Deep Convolutional Neural Networks

Deep Residual Learning for Image Recognition

Long short-term memory

Very Deep Convolutional Networks for Large-Scale Image Recognition

Gradient-based learning applied to document recognition