Deep Speech: Scaling up end-to-end speech recognition

Open AccessPosted Content

Deep Speech: Scaling up end-to-end speech recognition

- 17 Dec 2014

2.2K

TL;DR: Deep Speech, a state-of-the-art speech recognition system developed using end-to-end deep learning, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.1109/SLT48900.2021.9383581

Simplified Self-Attention for Transformer-Based end-to-end Speech Recognition

Haoneng Luo, +3 more

- 19 Jan 2021

TL;DR: This article proposed a simplified self-attention layer which employs FSMN memory blocks instead of projection layers to form query and key vectors for transformer-based end-to-end speech recognition.

...read moreread less

31

•Posted Content

Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation

Thai-Son Nguyen, +3 more

- 29 Oct 2019

- arXiv: Audio and Speech Processing

TL;DR: This paper examines the influence of three data augmentation methods on the performance of two S2S model architectures – a time perturbation in the frequency domain and sub-sequence sampling and their own development.

...read moreread less

30

•Posted Content

StochasticNet: Forming Deep Neural Networks via Stochastic Connectivity

Mohammad Javad Shafiee, +2 more

- 22 Aug 2015

- arXiv: Computer Vision and Pattern Recog...

TL;DR: Experimental results show that a StochasticNet using less than half the number of neural connections as a conventional deep neural network achieves comparable accuracy and reduces overfitting on the CIFAR-10, MNIST, and SVHN data sets.

...read moreread less

30

•Proceedings Article

Audio Visual Attribute Discovery for Fine-Grained Object Recognition.

Hua Zhang, +2 more

- 27 Apr 2018

TL;DR: This paper introduces a novel feature named audio visual attributes via discovering the correlations between the visual and audio representations and proposes a unified framework for training with video-level category label which can be implemented end-to-end in the step of inference.

...read moreread less

30

Proceedings Article•10.1109/ICASSP.2018.8462375

Says Who? Deep Learning Models for Joint Speech Recognition, Segmentation and Diarization

Amitrajit Sarkar, +3 more

- 15 Apr 2018

TL;DR: This work proposes a powerful adaptation of the state-of-the-art Speech Recognition models for speech segmentation and diarization, using the Libri Speech corpus and obtained comparable results with respect to state of theart in both tasks.

...read moreread less

30

...

Expand

References

•Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

- 03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

88.4K

•Proceedings Article•10.1109/CVPR.2015.7298594

Going deeper with convolutions

Christian Szegedy, +8 more

- 07 Jun 2015

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

56.6K

•Proceedings Article

Sequence to Sequence Learning with Neural Networks

Ilya Sutskever, +2 more

- 08 Dec 2014

TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.

...read moreread less

20.1K

•Proceedings Article

Rectified Linear Units Improve Restricted Boltzmann Machines

Vinod Nair, +1 more

- 21 Jun 2010

TL;DR: Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.

...read moreread less

18.4K

Journal Article•10.1162/NECO.1989.1.4.541

Backpropagation applied to handwritten zip code recognition

Yann LeCun, +6 more

- 01 Dec 1989

- Neural Computation

TL;DR: This paper demonstrates how constraints from the task domain can be integrated into a backpropagation network through the architecture of the network, successfully applied to the recognition of handwritten zip code digits provided by the U.S. Postal Service.

...read moreread less

12.5K

...

Expand

Deep Speech: Scaling up end-to-end speech recognition

Chat with Paper

AI Agents for this Paper

Citations

Simplified Self-Attention for Transformer-Based end-to-end Speech Recognition

Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation

StochasticNet: Forming Deep Neural Networks via Stochastic Connectivity

Audio Visual Attribute Discovery for Fine-Grained Object Recognition.

Says Who? Deep Learning Models for Joint Speech Recognition, Segmentation and Diarization

References

ImageNet Classification with Deep Convolutional Neural Networks

Going deeper with convolutions

Sequence to Sequence Learning with Neural Networks

Rectified Linear Units Improve Restricted Boltzmann Machines

Backpropagation applied to handwritten zip code recognition

Related Papers (5)

ImageNet Classification with Deep Convolutional Neural Networks

Deep Residual Learning for Image Recognition

Long short-term memory

Very Deep Convolutional Networks for Large-Scale Image Recognition

Gradient-based learning applied to document recognition