Journal Article10.1121/1.399423
Perceptual linear predictive (PLP) analysis of speech
3.1K
TL;DR: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.
read more
Abstract: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, is presented and examined. This technique uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum: (1) the critical-band spectral resolution, (2) the equal-loudness curve, and (3) the intensity-loudness power law. The auditory spectrum is then approximated by an autoregressive all-pole model. A 5th-order all-pole model is effective in suppressing speaker-dependent details of the auditory spectrum. In comparison with conventional linear predictive (LP) analysis, PLP analysis is more consistent with human hearing. The effective second formant F2' and the 3.5-Bark spectral-peak integration theories of vowel perception are well accounted for. PLP analysis is computationally efficient and yields a low-dimensional representation of speech. These properties are found to be useful in speaker-independent automatic-speech recognition.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Autoencoder based multi-stream combination for noise robust speech recognition
Sri Harish Mallidi,Tetsuji Ogawa,Karel Veselý,Phani Sankar Nidadavolu,Hynek Hermansky +4 more
- 06 Sep 2015
TL;DR: This work proposes to use autoencoders which are multi-layer feed forward neural networks, for estimating confidence measure, and shows that the reconstruction error of the autoencoder is correlated to the robustness of the corresponding stream.
21
Patent
Voice Activity Detection Using A Soft Decision Mechanism
Ron Wein
- 01 Aug 2014
TL;DR: In this article, a robust VAD algorithm that is also language independent is presented, where instead of classifying short segments of the audio as either speech or silence, the VAD as disclosed herein employees a soft-decision mechanism.
21
Detecting Emotions in Mandarin Speech
Tsang-Long Pao,Yu-Te Chen,Jun-Heng Yeh,Jhih-Jheng Lu +3 more
- 01 Sep 2004
TL;DR: A Mandarin speech based emotion classification method based on three classification techniques: LDA, K-NN and HMMs, which shows that the selected features are robust and effective for the emotion recognition in the valence and arousal dimensions of the two corpora.
Computer-implemented methods and systems for modeling and recognition of speech
TL;DR: In this article, a time-to-frequency domain transformation is performed on at least a portion of the received signal to generate a frequency domain representation, which is then converted from a time domain representation to the frequency domain.
21
Speaker Recognition: Advancements and Challenges
Homayoon Beigi
- 28 Nov 2012
TL;DR: A review of the most recent literature is presented and the latest techniques which are being deployed in the various branches of this technology are briefly visited.
21
References
Effect of glottal pulse shape on the quality of natural vowels.
TL;DR: In this article, a male speaker recorded monosyllabic words and a continuous sentence and a pitch-synchronous analysis was carried out by a digital computer on the vowel portions of these samples, for every pitch period, the analysis provided: formant frequencies, waveform of the glottal excitation function, and an accurate pitch-period measurement.
Prediction of perceived phonetic distance from critical-band spectra: A first step
Dennis H. Klatt
- 03 May 1982
TL;DR: Judgements of phonetic distance between pairs of static synthetic vowels and fricatives have been collected in which the stimulus ensemble included formant frequency changes and a number of acoustic changes that turn out to have little phonetic relevance.
349