Lexicon-Free Conversational Speech Recognition with Neural Networks
Andrew L. Maas,Ziang Xie,Dan Jurafsky,Andrew Y. Ng +3 more
- 01 Jan 2015
- pp 345-354
TL;DR: An approach to speech recognition that uses only a neural network to map acoustic input to characters, a character-level language model, and a beam search decoding procedure, making it possible to directly train a speech recognizer using errors generated by spoken language understanding tasks.
read more
Abstract: We present an approach to speech recognition that uses only a neural network to map acoustic input to characters, a character-level language model, and a beam search decoding procedure. This approach eliminates much of the complex infrastructure of modern speech recognition systems, making it possible to directly train a speech recognizer using errors generated by spoken language understanding tasks. The system naturally handles out of vocabulary words and spoken word fragments. We demonstrate our approach using the challenging Switchboard telephone conversation transcription task, achieving a word error rate competitive with existing baseline systems. To our knowledge, this is the first entirely neural-network-based system to achieve strong speech transcription results on a conversational speech task. We analyze qualitative differences between transcriptions produced by our lexicon-free approach and transcriptions produced by a standard speech recognition system. Finally, we evaluate the impact of large context neural network character language models as compared to standard n-gram models within our framework.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Direct Acoustics-to-Word Models for English Conversational Speech Recognition
Kartik Audhkhasi,Bhuvana Ramabhadran,George Saon,Michael Picheny,David Nahamoo +4 more
- 22 Mar 2017
TL;DR: This paper presents the first results employing direct acoustics-to-word CTC models on two well-known public benchmark tasks: Switchboard and CallHome, and presents rescoring results on CTC word model lattices to quantify the performance benefits of a LM, and contrast the performance of word and phone C TC models.
An On-chip Reconfigurable Multi-layer Perceptron and Recurrent Neural Network Processor for Speech Recognition
Junaid Hussain Muzamal,Anne Kwong,Muhammad Asghar +2 more
TL;DR: Researchers propose an on-chip reconfigurable MLP-RNN processor for speech recognition, achieving 75% weight reduction and 5x power improvement through LUT-powered multiplication, optimized for performance, cost, and power consumption in a 65nm CMOS process.
Quran Recitation Recognition using End-to-End Deep Learning
TL;DR: In this article , a CNN-Bidirectional GRU encoder and a character-based decoder were used to recognize the recitation of the Holy Quran in a public dataset.
Improving Deep Learning Based Automatic Speech Recognition for Gujarati
Deepang Raval,Vyom Pathak,Muktan Patel,Brijesh Bhatt +3 more
TL;DR: This study improves Gujarati automatic speech recognition using a deep learning-based approach, incorporating novel techniques such as prefix decoding and post-processing, achieving a 5.87% decrease in Word Error Rate on the Microsoft Speech Corpus.
LipType: A Silent Speech Recognizer Augmented with an Independent Repair Model
Laxmi Pandey,Ahmed Sabbir Arif +1 more
- 06 May 2021
TL;DR: In this article, an optimized version of LipNet for improved speed and accuracy is developed, and an independent repair model that processes video input for poor lighting conditions, when applicable, and corrects potential errors in output for increased accuracy.
References
•Proceedings Article
Deep Sparse Rectifier Neural Networks
Xavier Glorot,Antoine Bordes,Yoshua Bengio +2 more
- 14 Jun 2011
TL;DR: This paper shows that rectifying neurons are an even better model of biological neurons and yield equal or better performance than hyperbolic tangent networks in spite of the hard non-linearity and non-dierentiabil ity.
•Proceedings Article
The Kaldi Speech Recognition Toolkit
Daniel Povey,Arnab Ghoshal,Gilles Boulianne,Lukas Burget,Ondrej Glembek,Nagendra Kumar Goel,Mirko Hannemann,Petr Motlicek,Yanmin Qian,Petr Schwarz,Jan Silovsky,Georg Stemmer,Karel Vesely +12 more
- 01 Jan 2011
TL;DR: The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.
•Proceedings Article
Recurrent neural network based language model
Tomas Mikolov,Martin Karafiat,Lukas Burget,Jan Cernocký,Sanjeev Khudanpur +4 more
- 01 Jan 2010
TL;DR: Results indicate that it is possible to obtain around 50% reduction of perplexity by using mixture of several RNN LMs, compared to a state of the art backoff language model.
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks
Alex Graves,Santiago Fernández,Faustino Gomez,Jürgen Schmidhuber +3 more
- 25 Jun 2006
TL;DR: This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems of sequence learning and post-processing.
6.8K
•Journal Article
Deep Neural Networks for Acoustic Modeling in Speech Recognition
Geoffrey E. Hinton,Li Deng,Dong Yu,George E. Dahl,Abdelrahman Mohamed,Navdeep Jaitly,Andrew W. Senior,Vincent Vanhoucke,Patrick Nguyen,Tara N. Sainath,Brian Kingsbury +10 more
TL;DR: This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition.