Journal Article10.1121/1.399423
Perceptual linear predictive (PLP) analysis of speech
3.1K
TL;DR: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.
read more
Abstract: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, is presented and examined. This technique uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum: (1) the critical-band spectral resolution, (2) the equal-loudness curve, and (3) the intensity-loudness power law. The auditory spectrum is then approximated by an autoregressive all-pole model. A 5th-order all-pole model is effective in suppressing speaker-dependent details of the auditory spectrum. In comparison with conventional linear predictive (LP) analysis, PLP analysis is more consistent with human hearing. The effective second formant F2' and the 3.5-Bark spectral-peak integration theories of vowel perception are well accounted for. PLP analysis is computationally efficient and yields a low-dimensional representation of speech. These properties are found to be useful in speaker-independent automatic-speech recognition.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Dissertation
Model-based techniques for noise robust speech recognition
M. J. F. Gales
- 16 Sep 1995
TL;DR: The development of a model-based noise compensation technique, Parallel Model Combination, to alter the parameters of a set of Hidden Markov Model (HMM) based acoustic models, so that they reeect speech spoken in a new acoustic environment is detailed.
Patent
Crowd sourcing information to fulfill user requests
Thomas R. Gruber,Adam Cheyer,Donald W. Pitschel +2 more
- 15 Mar 2013
TL;DR: In this article, a failure to provide a satisfactory response to a user request is detected and information relevant to the user request was crowd-sourced by querying one or more crowd sourcing information sources.
323
The subspace Gaussian mixture model-A structured model for speech recognition
Daniel Povey,Lukas Burget,Mohit Agarwal,Pinar Akyazi,Feng Kai,Arnab Ghoshal,Ondřej Glembek,Nagendra Kumar Goel,Martin Karafiat,Ariya Rastrow,Richard Rose,Petr Schwarz,Samuel Thomas +12 more
TL;DR: A new approach to speech recognition, in which all Hidden Markov Model states share the same Gaussian Mixture Model (GMM) structure with the same number of Gaussians in each state, appears to give better results than a conventional model.
323
•Proceedings Article
Mel-generalized cepstral analysis - a unified approach to speech spectral estimation.
Keiichi Tokuda,Takao Kobayashi,Takashi Masuko,Satoshi Imai +3 more
- 01 Jan 1994
TL;DR: This paper proposes a spectral estimation method which uses the spectral model represented by mel-generalized cepstral coefficients.
318
Robust speech recognition using the modulation spectrogram
Brian Kingsbury,Brian Kingsbury,Nelson Morgan,Nelson Morgan,Steven Greenberg,Steven Greenberg +5 more
TL;DR: Using the modulation spectrogram as a front end for ASR provides a significant improvement in performance on highly reverberant speech and when it is used in combination with log-RASTA-PLP performance over a range of noisy and reverberant conditions is significantly improved, suggesting that the use of multiple representations is another promising method for improving the robustness of ASR systems.
307
References
Effect of glottal pulse shape on the quality of natural vowels.
TL;DR: In this article, a male speaker recorded monosyllabic words and a continuous sentence and a pitch-synchronous analysis was carried out by a digital computer on the vowel portions of these samples, for every pitch period, the analysis provided: formant frequencies, waveform of the glottal excitation function, and an accurate pitch-period measurement.
Prediction of perceived phonetic distance from critical-band spectra: A first step
Dennis H. Klatt
- 03 May 1982
TL;DR: Judgements of phonetic distance between pairs of static synthetic vowels and fricatives have been collected in which the stimulus ensemble included formant frequency changes and a number of acoustic changes that turn out to have little phonetic relevance.
349