Journal Article10.1121/1.399423
Perceptual linear predictive (PLP) analysis of speech
3.1K
TL;DR: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.
read more
Abstract: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, is presented and examined. This technique uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum: (1) the critical-band spectral resolution, (2) the equal-loudness curve, and (3) the intensity-loudness power law. The auditory spectrum is then approximated by an autoregressive all-pole model. A 5th-order all-pole model is effective in suppressing speaker-dependent details of the auditory spectrum. In comparison with conventional linear predictive (LP) analysis, PLP analysis is more consistent with human hearing. The effective second formant F2' and the 3.5-Bark spectral-peak integration theories of vowel perception are well accounted for. PLP analysis is computationally efficient and yields a low-dimensional representation of speech. These properties are found to be useful in speaker-independent automatic-speech recognition.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Handbook of Natural Language Processing and Machine Translation
Joseph Olive,Caitlin Christianson,John McCary +2 more
- 01 Jan 2011
TL;DR: This comprehensive handbook, written by leading experts in the field, details the groundbreaking research conducted under the breakthrough GALE program--The Global Autonomous Language Exploitation within the Defense Advanced Research Projects Agency (DARPA), while placing it in the context of previous research in the fields of natural language and signal processing, artificial intelligence and machine translation.
150
Multi-resolution RASTA filtering for TANDEM-based ASR
Hynek Hermansky,Petr Fousek +1 more
- 04 Sep 2005
TL;DR: New speech representation based on multiple filtering of temporal trajectories of speech energies in frequency sub-bands is proposed and tested, which is inherently robust to linear distortions.
Spectral Features for Automatic Text-Independent Speaker Recognition
Tomi Kinnunen
- 01 Jan 2003
TL;DR: This thesis attempts to see the feature extraction as a whole, starting from understanding the speech production process, what is known about speaker individuality, and then going to the methods adopted directly from the speech recognition task.
149
Objective Assessment of Speech and Audio Quality—Technology and Applications
TL;DR: An overview of the field is provided, outlining the main approaches to intrusive, nonintrusive and parametric models and discussing some of their limitations and areas of future work.
148
Patent
Methods and apparatus for altering audio output signals
Michael M. Lee
- 02 Apr 2008
TL;DR: In this paper, the authors present a system for altering an audio output to sound as if a different person had recorded it when it was played back when the audio data file was sent to the system.
148
References
Effect of glottal pulse shape on the quality of natural vowels.
TL;DR: In this article, a male speaker recorded monosyllabic words and a continuous sentence and a pitch-synchronous analysis was carried out by a digital computer on the vowel portions of these samples, for every pitch period, the analysis provided: formant frequencies, waveform of the glottal excitation function, and an accurate pitch-period measurement.
Prediction of perceived phonetic distance from critical-band spectra: A first step
Dennis H. Klatt
- 03 May 1982
TL;DR: Judgements of phonetic distance between pairs of static synthetic vowels and fricatives have been collected in which the stimulus ensemble included formant frequency changes and a number of acoustic changes that turn out to have little phonetic relevance.
349