Open AccessProceedings Article10.1109/ITA.2010.5454172

High-dimensional linear representations for robust speech recognition

- 26 Apr 2010

- pp 1-5

TL;DR: A generative framework for phoneme classification using linear features is developed and in the presence of additive noise, classification in this framework performs better than an analogous PLP classifier, adapted to noise using cepstral mean and variance normalisation, below 18dB SNR.

Abstract: Phoneme classification is investigated in linear feature domains with the aim of improving the robustness to additive noise. Linear feature domains allow for exact noise adaptation and so should result in more accurate classification than representations involving nonlinear processing and dimensionality reduction. We develop a generative framework for phoneme classification using linear features. We first show results for a representation consisting of concatenated frames from the centre of the phoneme, each containing f frames. As no single f is optimal for all phonemes, we further average over models with a range of values of f. Next we improve results by including information from the entire phoneme. In the presence of additive noise, classification in this framework performs better than an analogous PLP classifier, adapted to noise using cepstral mean and variance normalisation, below 18dB SNR.

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Most frequently asked questions

1. What are the contributions in "High-dimensional linear representations for robust speech recognition" ?

Phoneme classification is investigated in linear feature domains with the aim of improving the robustness to additive noise.. The authors first show results for a representation consisting of concatenated frames from the centre of the phoneme, each containing f frames.. Next the authors improve results by including information from the entire phoneme.. In the presence of additive noise, classification in this framework performs better than an analogous PLP classifier, adapted to noise using cepstral mean and variance normalisation, below 18dB SNR.. As no single f is optimal for all phonemes, the authors further average over models with a range of values of f.

Table 1. Phomene duration [ms] in the training data grouped by broad phonetic class.

Fig. 2. Comparison of existing phoneme representations. Top: Division described in [12] resulting in five sectors, three covering the duration of the phoneme and two of 40ms over the transitions. Bottom: f frames closest to the five points A,B,C,D and E (that correspond to the centres of the regions above) are selected to map the phoneme segment to five feature vectors xA,xB ,xC ,xD and xE .

Fig. 1. Model averaging for acoustic waveforms, MFCC and PLP models, all trained and tested in quiet conditions. Solid: GMMs with number of components shown; dashed: average over models up to number of components shown. The model average reduces the error rate in all cases.

Fig. 4. Performance of the classifiers in pink noise extracted from NOISEX-92. Curves shown for the best representation from Fig. 3 using the f -average. Dotted line indicates chance level at 93.5%.

Fig. 3. Error rates of the different representations tested in quiet condition, showing the improvement of the sector sum over the model average as a function of f , the number of frames from each sector.

Citations

•Proceedings Article

Consonant and vowel confusions in speech-weighted noise.

Sandeep A. Phatak, +1 more

- 01 Jan 2006

TL;DR: In this paper, the results of a closed-set recognition task for 64 consonant-vowel sounds (16 C X 4 V, spoken by 18 talkers) in speech-weighted noise (-22,20,16,10,2 [dB]) and in quiet were presented.

...read moreread less

129

Proceedings Article•10.1109/ISIT.2011.6034260

Combined waveform-cepstral representation for robust speech recognition

Matthew Ager, +2 more

- 01 Jul 2011

TL;DR: A convex combination of acoustic waveforms and cepstral features is considered and it achieves higher accuracy than either of the individual representations across all noise levels.

...read moreread less

References

Journal Article•10.1121/1.399423

Perceptual linear predictive (PLP) analysis of speech

Hynek Hermansky

- 01 Apr 1990

- Journal of the Acoustical Society of Ame...

TL;DR: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.

...read moreread less

3.1K

Journal Article•10.1121/1.1907526

An analysis of perceptual confusions among some English consonants.

George A. Miller, +1 more

- 01 Mar 1955

- Journal of the Acoustical Society of Ame...

TL;DR: In this paper, an articulatory analysis of 16 English consonants was performed over voice communication systems with frequency distortion and with random masking noise. The listeners were forced to guess at every sound and a count was made of all the different errors that resulted when one sound was confused with another.

...read moreread less

Journal Article•10.1109/29.46546

Speaker-independent phone recognition using hidden Markov models

Kai-Fu Lee, +1 more

- 01 Nov 1989

- IEEE Transactions on Acoustics, Speech, ...

TL;DR: The authors introduce the co-occurrence smoothing algorithm, which enables accurate recognition even with very limited training data, and can be used as benchmarks to evaluate future systems.

...read moreread less

•Journal Article•10.1016/S0167-6393(97)00021-6

Speech recognition by machines and humans

Richard P. Lippmann

- 01 Jul 1997

- Speech Communication

TL;DR: Comparisons suggest that the human-machine performance gap can be reduced by basic research on improving low-level acoustic-phonetic modeling, on improving robustness with noise and channel variability, and on more accurately modeling spontaneous speech.

...read moreread less

650

•Proceedings Article•10.1109/ICASSP.1999.759734

On the use of support vector machines for phonetic classification

P. Clarkson, +1 more

- 15 Mar 1999

TL;DR: This paper explores the issues involved in applying SVMs to phonetic classification as a first step to speech recognition and presents results on several standard vowel and phonetic Classification tasks and shows better performance than Gaussian mixture classifiers.

...read moreread less

192

...

Expand

High-dimensional linear representations for robust speech recognition

Chat with Paper

AI Agents for this Paper

Most frequently asked questions

1. What are the contributions in "High-dimensional linear representations for robust speech recognition" ?

Figures

Citations

Consonant and vowel confusions in speech-weighted noise.

Combined waveform-cepstral representation for robust speech recognition

References

Perceptual linear predictive (PLP) analysis of speech

An analysis of perceptual confusions among some English consonants.

Speaker-independent phone recognition using hidden Markov models

Speech recognition by machines and humans

On the use of support vector machines for phonetic classification

Related Papers (5)

Time-domain isolated phoneme classification using reconstructed phase spaces

Random discriminant structure analysis for automatic recognition of connected vowels

MLP-based isolated phoneme classification using likelihood features extracted from reconstructed phase space

Speech feature analysis using variational Bayesian PCA

Unsupervised Data-Driven Feature Vector Normalization With Acoustic Model Adaptation for Robust Speech Recognition