Journal Article10.1121/1.399423
Perceptual linear predictive (PLP) analysis of speech
3.1K
TL;DR: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.
read more
Abstract: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, is presented and examined. This technique uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum: (1) the critical-band spectral resolution, (2) the equal-loudness curve, and (3) the intensity-loudness power law. The auditory spectrum is then approximated by an autoregressive all-pole model. A 5th-order all-pole model is effective in suppressing speaker-dependent details of the auditory spectrum. In comparison with conventional linear predictive (LP) analysis, PLP analysis is more consistent with human hearing. The effective second formant F2' and the 3.5-Bark spectral-peak integration theories of vowel perception are well accounted for. PLP analysis is computationally efficient and yields a low-dimensional representation of speech. These properties are found to be useful in speaker-independent automatic-speech recognition.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Deep Scattering Spectra with Deep Neural Networks for LVCSR Tasks
Tara N. Sainath,Vijayaditya Peddinti,Brian Kingsbury,Petr Fousek,Bhuvana Ramabhadran,David Nahamoo +5 more
- 14 Sep 2014
TL;DR: This paper explores the optimal multi-resolution time and frequency scattering operations for LVCSR tasks, and explores techniques to reduce the dimension of the DSS features, which are similar to multi- Resolution log-mel + MFCCs and similar improvements can be obtained with this representation.
22
Using Phonologically Weighted Levenshtein Distances for the Prediction of Microscopic Intelligibility
Lionel Fontan,Isabelle Ferrané,Jérôme Farinas,Julien Pinquier,Xavier Aumont +4 more
- 08 Sep 2016
TL;DR: A new method for analyzing Automatic Speech Recognition (ASR) results at the phonological feature level, which allows to survey features additions or deletions, providing microscopic qualitative information as a complement to word recognition scores.
22
Lattice segmentation and minimum Bayes risk discriminative training for large vocabulary continuous speech recognition
Vlasios Doumpiotis,William Byrne +1 more
TL;DR: Refinement of the search space that allows the use of specialized discriminative models is shown to be an improvement over rescoring with conventionally trained discrim inative models.
22
Hybrid Signal-and-Link-Parametric Speech Quality Measurement for VoIP Communications
Tiago H. Falk,Wai-Yip Chan +1 more
TL;DR: In this article, a hybrid signal-and-link-parametric approach to speech quality measurement for voice-over-Internet protocol (VoIP) communications is described, which is tested on speech degraded by acoustic noise, temporal clippings, and noise suppression artifacts, thus simulating degradations present in wireless-VoIP tandem connections.
•Dissertation
Acoustic feature combination for speech recognition
András Zolnay,Hermann Ney +1 more
- 01 Jan 2006
TL;DR: The results show that the accuracy of automatic speech recognition systems can be significantly improved by the combination of auditory and articulatory motivated features.
22
References
Effect of glottal pulse shape on the quality of natural vowels.
TL;DR: In this article, a male speaker recorded monosyllabic words and a continuous sentence and a pitch-synchronous analysis was carried out by a digital computer on the vowel portions of these samples, for every pitch period, the analysis provided: formant frequencies, waveform of the glottal excitation function, and an accurate pitch-period measurement.
Prediction of perceived phonetic distance from critical-band spectra: A first step
Dennis H. Klatt
- 03 May 1982
TL;DR: Judgements of phonetic distance between pairs of static synthetic vowels and fricatives have been collected in which the stimulus ensemble included formant frequency changes and a number of acoustic changes that turn out to have little phonetic relevance.
349