Deep Speech: Scaling up end-to-end speech recognition

Open AccessPosted Content

Deep Speech: Scaling up end-to-end speech recognition

- 17 Dec 2014

2.2K

TL;DR: Deep Speech, a state-of-the-art speech recognition system developed using end-to-end deep learning, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1109/tmm.2023.3308441

SD-NeRF: Towards Lifelike Talking Head Animation via Spatially-adaptive Dual-driven NeRFs

Shuai Shen, +5 more

- IEEE Transactions on Multimedia

TL;DR: This article proposes a fully end-to-end talking head animation method, which implicitly grasps the 3D structures by learning a conditional Neural Radiance Field (NeRF), and develops an audio-motion dual-driven NeRF model to take a step toward more lifelike talking head synthesis.

...read moreread less

13

Proceedings Article•10.21437/INTERSPEECH.2019-1680

Large margin training for attention-based end-to-end speech recognition

Peidong Wang, +3 more

- 15 Sep 2019

TL;DR: A novel large marge training scheme for attention based end-to-end speech recognition that can achieve comparable performance to the minimum Bayes risk based minimum word error rate (MWER) criterion.

...read moreread less

13

•Posted Content

Hardware-Guided Symbiotic Training for Compact, Accurate, yet Execution-Efficient LSTM

Hongxu Yin, +5 more

- 30 Jan 2019

- arXiv: Neural and Evolutionary Computing

TL;DR: This work proposes a hardware-guided symbiotic training methodology for compact, accurate, yet execution-efficient inference models based on the observation that hardware may introduce substantial non-monotonic behavior, which it is called the latency hysteresis effect, when evaluating network size vs. inference latency.

...read moreread less

13

•Proceedings Article•10.21437/CHIME.2018-6

Scaling Speech Enhancement in Unseen Environments with Noise Embeddings

Gil Keren, +2 more

- 07 Sep 2018

TL;DR: This work addresses the problem of speech enhancement generalisation to unseen environments by performing two manipulations that reduce word error rates of a pretrained speech recognition system and improve enhancement quality according to a number of performance measures.

...read moreread less

13

Proceedings Article•10.23919/PICMET.2018.8481823

Artificial Intelligence on Job-Hopping Forecasting: AI on Job-Hopping

Nathan Kosylo, +6 more

- 01 Aug 2018

TL;DR: A novel AI technology, Sequential Optimization of Naive Bayesian (SONB), which not only makes predictions, but also learns the underlying pattern and automatically estimates missing or unreliable feature values, which could also be used to estimate missing values in the input data.

...read moreread less

13

...

Expand

References

•Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

- 03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

88.4K

•Proceedings Article•10.1109/CVPR.2015.7298594

Going deeper with convolutions

Christian Szegedy, +8 more

- 07 Jun 2015

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

56.6K

•Proceedings Article

Sequence to Sequence Learning with Neural Networks

Ilya Sutskever, +2 more

- 08 Dec 2014

TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.

...read moreread less

20.1K

•Proceedings Article

Rectified Linear Units Improve Restricted Boltzmann Machines

Vinod Nair, +1 more

- 21 Jun 2010

TL;DR: Restricted Boltzmann machines were developed using binary stochastic hidden units that learn features that are better for object recognition on the NORB dataset and face verification on the Labeled Faces in the Wild dataset.

...read moreread less

18.4K

Journal Article•10.1162/NECO.1989.1.4.541

Backpropagation applied to handwritten zip code recognition

Yann LeCun, +6 more

- 01 Dec 1989

- Neural Computation

TL;DR: This paper demonstrates how constraints from the task domain can be integrated into a backpropagation network through the architecture of the network, successfully applied to the recognition of handwritten zip code digits provided by the U.S. Postal Service.

...read moreread less

12.5K

...

Expand

Deep Speech: Scaling up end-to-end speech recognition

Chat with Paper

AI Agents for this Paper

Citations

SD-NeRF: Towards Lifelike Talking Head Animation via Spatially-adaptive Dual-driven NeRFs

Large margin training for attention-based end-to-end speech recognition

Hardware-Guided Symbiotic Training for Compact, Accurate, yet Execution-Efficient LSTM

Scaling Speech Enhancement in Unseen Environments with Noise Embeddings

Artificial Intelligence on Job-Hopping Forecasting: AI on Job-Hopping

References

ImageNet Classification with Deep Convolutional Neural Networks

Going deeper with convolutions

Sequence to Sequence Learning with Neural Networks

Rectified Linear Units Improve Restricted Boltzmann Machines

Backpropagation applied to handwritten zip code recognition

Related Papers (5)

ImageNet Classification with Deep Convolutional Neural Networks

Deep Residual Learning for Image Recognition

Long short-term memory

Very Deep Convolutional Networks for Large-Scale Image Recognition

Gradient-based learning applied to document recognition