Journal Article10.1121/1.399423
Perceptual linear predictive (PLP) analysis of speech
3.1K
TL;DR: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.
read more
Abstract: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, is presented and examined. This technique uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum: (1) the critical-band spectral resolution, (2) the equal-loudness curve, and (3) the intensity-loudness power law. The auditory spectrum is then approximated by an autoregressive all-pole model. A 5th-order all-pole model is effective in suppressing speaker-dependent details of the auditory spectrum. In comparison with conventional linear predictive (LP) analysis, PLP analysis is more consistent with human hearing. The effective second formant F2' and the 3.5-Bark spectral-peak integration theories of vowel perception are well accounted for. PLP analysis is computationally efficient and yields a low-dimensional representation of speech. These properties are found to be useful in speaker-independent automatic-speech recognition.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Speech emotion recognition using Support Vector Machines
Thapanee Seehapoch,Sartra Wongthanavasu +1 more
- 06 May 2013
TL;DR: An attempt has been made to recognize and classify the speech emotion from three language databases, namely, Berlin, Japan and Thai emotion databases, using Support Vector Machines (SVM) as the classification model.
144
Low-Complexity, Nonintrusive Speech Quality Assessment
TL;DR: A low-complexity algorithm for monitoring the speech quality over a network that can be computed from commonly used speech-coding parameters without explicit distortion modeling is described.
142
Patent
Speaker identification and unsupervised speaker adaptation techniques
Yoon Kim,Sachin S. Kajarekar +1 more
- 27 Aug 2015
TL;DR: In this paper, a speaker identification system for virtual assistants is presented, in which a speaker profile is generated for each user based on the speaker profile for a predetermined user and contextual information is used to verify results produced by the speaker identification process.
142
Human and Machine Hearing: Extracting Meaning from Sound
Richard F. Lyon
- 01 Apr 2017
TL;DR: Richard (Dick) Lyon, a Principal Research Scientist at Google, is well known for his work on models of the auditory system, particularly cochlear models, and for developing analog and digital implementations of those models, in hardware and software.
140
New entropy based combination rules in HMM/ANN multi-stream ASR
Hemant Misra,Hervé Bourlard,Vivek Tyagi +2 more
- 06 Apr 2003
TL;DR: Three new entropy based combination rules are tested in a full-combination multi-stream HMM/ANN system for noise robust speech recognition by combining all the classifiers having entropy below average using a weighting proportional to their inverse entropy.
140
References
Effect of glottal pulse shape on the quality of natural vowels.
TL;DR: In this article, a male speaker recorded monosyllabic words and a continuous sentence and a pitch-synchronous analysis was carried out by a digital computer on the vowel portions of these samples, for every pitch period, the analysis provided: formant frequencies, waveform of the glottal excitation function, and an accurate pitch-period measurement.
Prediction of perceived phonetic distance from critical-band spectra: A first step
Dennis H. Klatt
- 03 May 1982
TL;DR: Judgements of phonetic distance between pairs of static synthetic vowels and fricatives have been collected in which the stimulus ensemble included formant frequency changes and a number of acoustic changes that turn out to have little phonetic relevance.
349