Speech analytics

Topic Tools

Papers published on a yearly basis

1 / 2

Papers

Book•

Fundamentals of speech recognition

[...]

Lawrence R. Rabiner, Biing-Hwang Juang

1 Jan 1993

TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.

...read moreread less

Abstract: 1. Fundamentals of Speech Recognition. 2. The Speech Signal: Production, Perception, and Acoustic-Phonetic Characterization. 3. Signal Processing and Analysis Methods for Speech Recognition. 4. Pattern Comparison Techniques. 5. Speech Recognition System Design and Implementation Issues. 6. Theory and Implementation of Hidden Markov Models. 7. Speech Recognition Based on Connected Word Models. 8. Large Vocabulary Continuous Speech Recognition. 9. Task-Oriented Applications of Automatic Speech Recognition.

...read moreread less

9,412 citations

Posted Content•

Deep Speech: Scaling up end-to-end speech recognition

[...]

Awni Hannun¹, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, Andrew Y. Ng - Show less +7 more•Institutions (1)

Baidu¹

17 Dec 2014-arXiv: Computation and Language

TL;DR: Deep Speech, a state-of-the-art speech recognition system developed using end-to-end deep learning, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set.

...read moreread less

Abstract: We present a state-of-the-art speech recognition system developed using end-to-end deep learning. Our architecture is significantly simpler than traditional speech systems, which rely on laboriously engineered processing pipelines; these traditional systems also tend to perform poorly when used in noisy environments. In contrast, our system does not need hand-designed components to model background noise, reverberation, or speaker variation, but instead directly learns a function that is robust to such effects. We do not need a phoneme dictionary, nor even the concept of a "phoneme." Key to our approach is a well-optimized RNN training system that uses multiple GPUs, as well as a set of novel data synthesis techniques that allow us to efficiently obtain a large amount of varied data for training. Our system, called Deep Speech, outperforms previously published results on the widely studied Switchboard Hub5'00, achieving 16.0% error on the full test set. Deep Speech also handles challenging noisy environments better than widely used, state-of-the-art commercial speech systems.

...read moreread less

2,309 citations

Journal Article•10.1016/J.PATCOG.2010.09.020•

Survey on speech emotion recognition: Features, classification schemes, and databases

[...]

Moataz M. H. El Ayadi¹, Mohamed S. Kamel², Fakhri Karray²•Institutions (2)

Cairo University¹, University of Waterloo²

01 Mar 2011-Pattern Recognition

TL;DR: A survey of speech emotion classification addressing three important aspects of the design of a speech emotion recognition system, the choice of suitable features for speech representation, and the proper preparation of an emotional speech database for evaluating system performance are addressed.

...read moreread less

2,166 citations

Book•

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

[...]

Xuedong Huang¹, Alex Acero¹, Hsiao-Wuen Hon¹, Raj Reddy•Institutions (1)

Microsoft¹

1 Jan 2001

TL;DR: Spoken Language Processing draws on the latest advances and techniques from multiple fields: computer science, electrical engineering, acoustics, linguistics, mathematics, psychology, and beyond to create the state of the art in spoken language technology.

...read moreread less

Abstract: From the Publisher: New advances in spoken language processing: theory and practice In-depth coverage of speech processing, speech recognition, speech synthesis, spoken language understanding, and speech interface design Many case studies from state-of-the-art systems, including examples from Microsoft's advanced research labs Spoken Language Processing draws on the latest advances and techniques from multiple fields: computer science, electrical engineering, acoustics, linguistics, mathematics, psychology, and beyond. Starting with the fundamentals, it presents all this and more: Essential background on speech production and perception, probability and information theory, and pattern recognition Extracting information from the speech signal: useful representations and practical compression solutions Modern speech recognition techniques: hidden Markov models, acoustic and language modeling, improving resistance to environmental noises, search algorithms, and large vocabulary speech recognition Text-to-speech: analyzing documents, pitch and duration controls; trainable synthesis, and more Spoken language understanding: dialog management, spoken language applications, and multimodal interfaces To illustrate the book's methods, the authors present detailed case studies based on state-of-the-art systems, including Microsoft's Whisper speech recognizer, Whistler text-to-speech system, Dr. Who dialog system, and the MiPad handheld device. Whether you're planning, designing, building, or purchasing spoken language technology, this is the state of the artfromalgorithms through business productivity.

...read moreread less

2,036 citations

Journal Article•10.1121/1.2229005•

An audio-visual corpus for speech perception and automatic speech recognition

[...]

Martin Cooke¹, Jon Barker, Stuart Cunningham, Xu Shao•Institutions (1)

University of Sheffield¹

24 Oct 2006-Journal of the Acoustical Society of America

TL;DR: An audio-visual corpus that consists of high-quality audio and video recordings of 1000 sentences spoken by each of 34 talkers to support the use of common material in speech perception and automatic speech recognition studies.

...read moreread less

Abstract: An audio-visual corpus has been collected to support the use of common material in speech perception and automatic speech recognition studies. The corpus consists of high-quality audio and video recordings of 1000 sentences spoken by each of 34 talkers. Sentences are simple, syntactically identical phrases such as "place green at B 4 now". Intelligibility tests using the audio signals suggest that the material is easily identifiable in quiet and low levels of stationary noise. The annotated corpus is available on the web for research use.

...read moreread less

1,355 citations

...

Expand

Year	Papers
2025	1
2024	4
2023	7
2022	16
2021	7
2020	7

Topic Tools

Papers published on a yearly basis

Papers

Fundamentals of speech recognition

Deep Speech: Scaling up end-to-end speech recognition

Survey on speech emotion recognition: Features, classification schemes, and databases

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

An audio-visual corpus for speech perception and automatic speech recognition

Related Topics (5)

Performance Metrics