Journal Article10.1121/1.399423
Perceptual linear predictive (PLP) analysis of speech
3.1K
TL;DR: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.
read more
Abstract: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, is presented and examined. This technique uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum: (1) the critical-band spectral resolution, (2) the equal-loudness curve, and (3) the intensity-loudness power law. The auditory spectrum is then approximated by an autoregressive all-pole model. A 5th-order all-pole model is effective in suppressing speaker-dependent details of the auditory spectrum. In comparison with conventional linear predictive (LP) analysis, PLP analysis is more consistent with human hearing. The effective second formant F2' and the 3.5-Bark spectral-peak integration theories of vowel perception are well accounted for. PLP analysis is computationally efficient and yields a low-dimensional representation of speech. These properties are found to be useful in speaker-independent automatic-speech recognition.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Cross-Domain and Cross-Language Portability of Acoustic Features Estimated by Multilayer Perceptrons
Andreas Stolcke,Frantisek Grezl,Mei-Yuh Hwang,Xin Lei,Nelson Morgan,Dimitra Vergyri +5 more
- 14 May 2006
TL;DR: It is shown that even without retraining, English-trained MLP features can provide a significant boost to recognition accuracy in new domains within the same language, as well as in entirely different languages such as Mandarin and Arabic.
134
Patent
Multi-tiered voice feedback in an electronic device
James Eric Mason,Jesse Boettcher +1 more
- 01 Sep 2009
TL;DR: In this paper, the authors proposed a voice feedback system that provides voice feedback for displayed speakable elements based on the associated tier of the display of each speakable element and the audio files for each speaker.
133
Segmentation, Diarization and Speech Transcription: Surprise Data Unraveled
Marijn Huijbregts
- 21 Nov 2008
TL;DR: In this thesis methods are presented for which no external training data is required for training models, and these novel methods have been implemented in a large vocabulary continuous speech recognition system called SHoUT.
Analyzing convolutional neural networks for speech activity detection in mismatched acoustic conditions
Samuel Thomas,Sriram Ganapathy,George Saon,Hagen Soltau +3 more
- 04 May 2014
TL;DR: CNNs are used as acoustic models for speech activity detection (SAD) on data collected over noisy radio communication channels to illustrate that CNNs have a considerable advantage in fast adaptation for acoustic modeling in these settings.
132
Patent
Digital assistant providing whispered speech
Tuomo Raitio,Melvyn J. Hunt,Hywel Richards,Chinthakunta Madhusudan +3 more
- 15 Sep 2016
TL;DR: In this article, a system and processes for detecting and/or providing a whispered speech response are provided, where speech is received from a user, and based on the speech input, determined that a whispering speech response is to be provided.
132
References
Effect of glottal pulse shape on the quality of natural vowels.
TL;DR: In this article, a male speaker recorded monosyllabic words and a continuous sentence and a pitch-synchronous analysis was carried out by a digital computer on the vowel portions of these samples, for every pitch period, the analysis provided: formant frequencies, waveform of the glottal excitation function, and an accurate pitch-period measurement.
Prediction of perceived phonetic distance from critical-band spectra: A first step
Dennis H. Klatt
- 03 May 1982
TL;DR: Judgements of phonetic distance between pairs of static synthetic vowels and fricatives have been collected in which the stimulus ensemble included formant frequency changes and a number of acoustic changes that turn out to have little phonetic relevance.
349