Journal Article10.1016/S0167-6393(98)00032-6
Robust speech recognition using the modulation spectrogram
Brian Kingsbury,Brian Kingsbury,Nelson Morgan,Nelson Morgan,Steven Greenberg,Steven Greenberg +5 more
TL;DR: Using the modulation spectrogram as a front end for ASR provides a significant improvement in performance on highly reverberant speech and when it is used in combination with log-RASTA-PLP performance over a range of noisy and reverberant conditions is significantly improved, suggesting that the use of multiple representations is another promising method for improving the robustness of ASR systems.
read more
About: This article is published in Speech Communication. The article was published on 01 Aug 1998. The article focuses on the topics: Speech processing & Spectrogram.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Speaker Diarization: A Review of Recent Research
Xavier Anguera Miro,Simon Bozonnet,Nicholas Evans,Corinne Fredouille,Gerald Friedland,Oriol Vinyals +5 more
TL;DR: An analysis of speaker diarization performance as reported through the NIST Rich Transcription evaluations on meeting data and identify important areas for future research are presented.
Automatic speech recognition and speech variability: A review
Mohamed Faouzi BenZeghiba,R. De Mori,Olivier Deroo,Stéphane Dupont,T. Erbes,D. Jouvet,Luciano Fissore,Pietro Laface,Alfred Mertins,Christophe Ris,Richard Rose,Vivek Tyagi,Christian Wellekens +12 more
TL;DR: Current advances related to automatic speech recognition (ASR) and spoken language systems and deficiencies in dealing with variation naturally present in speech are outlined.
617
Power-normalized cepstral coefficients (PNCC) for robust speech recognition
Chanwoo Kim,Richard M. Stern +1 more
TL;DR: Experimental results demonstrate that PNCC processing provides substantial improvements in recognition accuracy compared to MFCC and PLP processing for speech in the presence of various types of additive noise and in reverberant environments, with only slightly greater computational cost than conventional MFCC processing.
Power-Normalized Cepstral Coefficients (PNCC) for robust speech recognition
Chanwoo Kim,Richard M. Stern +1 more
- 25 Mar 2012
TL;DR: Experimental results demonstrate that PNCC processing provides substantial improvements in recognition accuracy compared to MFCC and PLP processing for speech in the presence of various types of additive noise and in reverberant environments, with only slightly greater computational cost than conventional MFCC processing.
Making Machines Understand Us in Reverberant Rooms: Robustness Against Reverberation for Automatic Speech Recognition
Takuya Yoshioka,Armin Sehr,Marc Delcroix,Keisuke Kinoshita,Roland Maas,Tomohiro Nakatani,Walter Kellermann +6 more
TL;DR: For a number of unexplored but important applications, distant microphones are a prerequisite for extending the availability of speech recognizers as well as enhancing the convenience of existing speech recognition applications.
References
Perceptual linear predictive (PLP) analysis of speech
TL;DR: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.
3.1K
RASTA processing of speech
Hynek Hermansky,Nelson Morgan +1 more
TL;DR: The theoretical and experimental foundations of the RASTA method are reviewed, the relationship with human auditory perception is discussed, the original method is extended to combinations of additive noise and convolutional noise, and an application is shown to speech enhancement.
2.1K
Effect of temporal envelope smearing on speech reception
TL;DR: The effect of smearing the temporal envelope on the speech-reception threshold (SRT) for sentences in noise and on phoneme identification was investigated for normal-hearing listeners, showing a severe reduction in sentence intelligibility for narrow processing bands at low cutoff frequencies.
910
A physical method for measuring speech-transmission quality
TL;DR: The resulting index, the Speech-Transmission Index (STI), has been correlated with subjective intelligibility scores obtained on 167 different transmission channels with a wide variety of disturbances and the relative predictive power of the STI appeared to be 5%.
870
A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria
TL;DR: In this paper, a series of studies on various aspects of the chain of relations between auditorium acoustics, modulation transfer function (MTF), speech transmission index (STI), and speech intelligibility are presented.
759