Journal Article10.1121/1.399423
Perceptual linear predictive (PLP) analysis of speech
3.1K
TL;DR: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.
read more
Abstract: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, is presented and examined. This technique uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum: (1) the critical-band spectral resolution, (2) the equal-loudness curve, and (3) the intensity-loudness power law. The auditory spectrum is then approximated by an autoregressive all-pole model. A 5th-order all-pole model is effective in suppressing speaker-dependent details of the auditory spectrum. In comparison with conventional linear predictive (LP) analysis, PLP analysis is more consistent with human hearing. The effective second formant F2' and the 3.5-Bark spectral-peak integration theories of vowel perception are well accounted for. PLP analysis is computationally efficient and yields a low-dimensional representation of speech. These properties are found to be useful in speaker-independent automatic-speech recognition.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Language Identification: A Tutorial
TL;DR: This tutorial presents an overview of the progression of spoken language identification (LID) systems and current developments, and Evaluations of the LID system are presented using NIST language recognition evaluation tasks.
163
Patent
Intelligent assistant for home automation
Ryan M. Orr,Garett R. Nell,Benjamin Lloyd Brumbaugh +2 more
- 31 Mar 2015
TL;DR: In this paper, the authors describe a system for using a virtual assistant to control electronic devices, where a user can speak an input in natural language form to a user device, which can forward the commands to the appropriate one or more electronic devices for execution.
162
Patent
Switching between text data and audio data based on a mapping
Alan C. Cannistraro,Gregory S. Robbin,Casey M. Dougherty,Raymond Walsh,Melissa Breglio Hajj +4 more
- 06 Oct 2011
TL;DR: In this paper, a mapping between audio data and text data, whether the mapping is created automatically or manually, is presented. And the mapping can be used to determine where an annotation created in one media context (e.g., audio) will be consumed in another media context.
160
Speaker Identification Using Instantaneous Frequencies
Marco Grimaldi,Fred Cummins +1 more
TL;DR: A novel parametrization of speech that is based on the AM-FM representation of the speech signal and to assess the utility of these features in the context of speaker identification is presented.
158
Patent
Reducing the need for manual start/end-pointing and trigger phrases
Philippe P. Piernot,Justin G. Binder +1 more
- 27 May 2015
TL;DR: In this paper, a system for selectively processing and responding to a spoken user input is presented, based on contextual information and a rule-based system or a probabilistic system.
157
References
Effect of glottal pulse shape on the quality of natural vowels.
TL;DR: In this article, a male speaker recorded monosyllabic words and a continuous sentence and a pitch-synchronous analysis was carried out by a digital computer on the vowel portions of these samples, for every pitch period, the analysis provided: formant frequencies, waveform of the glottal excitation function, and an accurate pitch-period measurement.
Prediction of perceived phonetic distance from critical-band spectra: A first step
Dennis H. Klatt
- 03 May 1982
TL;DR: Judgements of phonetic distance between pairs of static synthetic vowels and fricatives have been collected in which the stimulus ensemble included formant frequency changes and a number of acoustic changes that turn out to have little phonetic relevance.
349