Journal Article10.1121/1.399423
Perceptual linear predictive (PLP) analysis of speech
3.1K
TL;DR: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.
read more
Abstract: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, is presented and examined. This technique uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum: (1) the critical-band spectral resolution, (2) the equal-loudness curve, and (3) the intensity-loudness power law. The auditory spectrum is then approximated by an autoregressive all-pole model. A 5th-order all-pole model is effective in suppressing speaker-dependent details of the auditory spectrum. In comparison with conventional linear predictive (LP) analysis, PLP analysis is more consistent with human hearing. The effective second formant F2' and the 3.5-Bark spectral-peak integration theories of vowel perception are well accounted for. PLP analysis is computationally efficient and yields a low-dimensional representation of speech. These properties are found to be useful in speaker-independent automatic-speech recognition.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Book
Perceptually inspired signal processing strategies for robust speech recognition in reverberant environments
Brian Kingsbury,Nelson Morgan +1 more
- 01 Jan 1998
TL;DR: This work presentsceptually Inspired Signal-processing Strategies for Robust Speech Recognition in Reverberant Environments, a novel approach to signal-processing that automates the very labor-intensive and therefore time-heavy and expensive process of recognizing speech.
75
Patent
Systems and methods for structured stem and suffix language models
Jerome R. Bellegarda,Sibel Yaman +1 more
- 31 Aug 2015
TL;DR: The authors used a structured stem and suffix n-gram language model to predict words using a pre-existing word in the received input, and provided an output of the predicted word as an output to the user.
75
The Ring Array Processor: a multiprocessing peripheral for connectionist applications
TL;DR: The motivation for the RAP is described and how the architecture matches the target algorithm is shown, which is to reduce peak performance on the error back-propagation algorithm to about 50% of a linear speedup.
75
A 22nm, 10.8 μ W/15.1 μ W Dual Computing Modes High Power-Performance-Area Efficiency Domained Background Noise Aware Keyword- Spotting Processor
Bo Liu,Hao Cai,Zhen Wang,Sun Yuhao,Shen Zeyu,Wentao Zhu,Li Yan,Yu Gong,Ge Wei,Jun Yang,Longxing Shi +10 more
TL;DR: This paper proposes a high power-performance-area efficient background noise aware keyword-spotting (KWS) processor based on an optimized binarized weight network (BWN) processor with adaptively configured to use dual computing modes for both high recognition accuracy under high background noise and ultra-low power consumption under low background noise.
74
•Proceedings Article
Cross-lingual and multi-stream posterior features for low resource LVCSR systems.
Samuel Thomas,Sriram Ganapathy,Hynek Hermansky +2 more
- 01 Jan 2010
TL;DR: This work proposes to train low resource LVCSR system with additional sources of information like annotated data from other languages (German and Spanish) and various acoustic feature streams (short-term and modulation features) and multilayer perceptrons (MLPs) on these sources of Information.
References
Effect of glottal pulse shape on the quality of natural vowels.
TL;DR: In this article, a male speaker recorded monosyllabic words and a continuous sentence and a pitch-synchronous analysis was carried out by a digital computer on the vowel portions of these samples, for every pitch period, the analysis provided: formant frequencies, waveform of the glottal excitation function, and an accurate pitch-period measurement.
Prediction of perceived phonetic distance from critical-band spectra: A first step
Dennis H. Klatt
- 03 May 1982
TL;DR: Judgements of phonetic distance between pairs of static synthetic vowels and fricatives have been collected in which the stimulus ensemble included formant frequency changes and a number of acoustic changes that turn out to have little phonetic relevance.
349