Robust speech recognition using the modulation spectrogram

doi:10.1016/S0167-6393(98)00032-6

Journal Article10.1016/S0167-6393(98)00032-6

Robust speech recognition using the modulation spectrogram

Brian Kingsbury, +5 more

- 01 Aug 1998

- Speech Communication

- Vol. 25, Iss: 1, pp 117-132

302

TL;DR: Using the modulation spectrogram as a front end for ASR provides a significant improvement in performance on highly reverberant speech and when it is used in combination with log-RASTA-PLP performance over a range of noisy and reverberant conditions is significantly improved, suggesting that the use of multiple representations is another promising method for improving the robustness of ASR systems.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1109/TASL.2011.2125954

Speaker Diarization: A Review of Recent Research

Xavier Anguera Miro, +5 more

- 01 Feb 2012

- IEEE Transactions on Audio, Speech, and ...

TL;DR: An analysis of speaker diarization performance as reported through the NIST Rich Transcription evaluations on meeting data and identify important areas for future research are presented.

...read moreread less

848

•Journal Article•10.1016/J.SPECOM.2007.02.006

Automatic speech recognition and speech variability: A review

Mohamed Faouzi BenZeghiba, +12 more

- 01 Oct 2007

- Speech Communication

TL;DR: Current advances related to automatic speech recognition (ASR) and spoken language systems and deficiencies in dealing with variation naturally present in speech are outlined.

...read moreread less

617

•Journal Article•10.1109/TASLP.2016.2545928

Power-normalized cepstral coefficients (PNCC) for robust speech recognition

Chanwoo Kim, +1 more

- 01 Jul 2016

- IEEE Transactions on Audio, Speech, and ...

TL;DR: Experimental results demonstrate that PNCC processing provides substantial improvements in recognition accuracy compared to MFCC and PLP processing for speech in the presence of various types of additive noise and in reverberant environments, with only slightly greater computational cost than conventional MFCC processing.

...read moreread less

508

•Proceedings Article•10.1109/ICASSP.2012.6288820

Power-Normalized Cepstral Coefficients (PNCC) for robust speech recognition

Chanwoo Kim, +1 more

- 25 Mar 2012

TL;DR: Experimental results demonstrate that PNCC processing provides substantial improvements in recognition accuracy compared to MFCC and PLP processing for speech in the presence of various types of additive noise and in reverberant environments, with only slightly greater computational cost than conventional MFCC processing.

...read moreread less

398

Journal Article•10.1109/MSP.2012.2205029

Making Machines Understand Us in Reverberant Rooms: Robustness Against Reverberation for Automatic Speech Recognition

Takuya Yoshioka, +6 more

- 18 Oct 2012

- IEEE Signal Processing Magazine

TL;DR: For a number of unexplored but important applications, distant microphones are a prerequisite for extending the availability of speech recognizers as well as enhancing the convenience of existing speech recognition applications.

...read moreread less

313

...

Expand

References

Journal Article•10.1121/1.399423

Perceptual linear predictive (PLP) analysis of speech

Hynek Hermansky

- 01 Apr 1990

- Journal of the Acoustical Society of Ame...

TL;DR: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.

...read moreread less

3.1K

Journal Article•10.1109/89.326616

RASTA processing of speech

Hynek Hermansky, +1 more

- 01 Oct 1994

- IEEE Transactions on Speech and Audio Pr...

TL;DR: The theoretical and experimental foundations of the RASTA method are reviewed, the relationship with human auditory perception is discussed, the original method is extended to combinations of additive noise and convolutional noise, and an application is shown to speech enhancement.

...read moreread less

2.1K

Journal Article•10.1121/1.408467

Effect of temporal envelope smearing on speech reception

Rob Drullman, +2 more

- 01 Feb 1994

- Journal of the Acoustical Society of Ame...

TL;DR: The effect of smearing the temporal envelope on the speech-reception threshold (SRT) for sentences in noise and on phoneme identification was investigated for normal-hearing listeners, showing a severe reduction in sentence intelligibility for narrow processing bands at low cutoff frequencies.

...read moreread less

910

Journal Article•10.1121/1.384464

A physical method for measuring speech-transmission quality

H. J. M. Steeneken, +1 more

- 01 Jan 1980

- Journal of the Acoustical Society of Ame...

TL;DR: The resulting index, the Speech-Transmission Index (STI), has been correlated with subjective intelligibility scores obtained on 167 different transmission channels with a wide variety of disturbances and the relative predictive power of the STI appeared to be 5%.

...read moreread less

870

Journal Article•10.1121/1.392224

A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria

Tammo Houtgast, +1 more

- 01 Mar 1985

- Journal of the Acoustical Society of Ame...

TL;DR: In this paper, a series of studies on various aspects of the chain of relations between auditorium acoustics, modulation transfer function (MTF), speech transmission index (STI), and speech intelligibility are presented.

...read moreread less

759