Reducing errors by increasing the error rate: MLP Acoustic Modeling for Broadcast News Transcription

doi:10.7916/D8MW2SJ1

Reducing errors by increasing the error rate: MLP Acoustic Modeling for Broadcast News Transcription

- 01 Jan 1999

10

TL;DR: Some aspects of a Broadcast News recognition system based on hybrid HMM/MLP acoustic modeling are described, including the use of novel ‘modulation spectrogram’ features which are combined with conventional models at the posterior probability level, and an investigation of the interaction of model size and training set size for an multilayer perceptron (MLP) acoustic classifier.

Abstract: We describe some aspects of a Broadcast News recognition system based on hybrid HMM/MLP acoustic modeling. These include the use of novel ‘modulation spectrogram’ features which are combined with conventional models at the posterior probability level, some experiments with nonlinear segment normalization, and an investigation of the interaction of model size and training set size for an multilayer perceptron (MLP) acoustic classifier. We also report preliminary results of incorporating gender-dependence into this system. 1. Background In recent years, we and our colleagues have promoted the exploration of novel, poorly understood, but promising approaches to speech recognition [2]. While such deviations from incremental improvements might initially hurt performance, the subset of the new methods that would ultimately prove useful would not be found without such explorations. This past year, we attempted to follow this advice, while still developing a system with reasonable performance on the automatic transcription of Broadcast News speech. An additional goal was finding approaches that would work well in combination with components developed by our SPRACH partners at Cambridge and Sheffield. Finally, previous published results seemed to indicate that, while the hybrid HMM/connectionist approach was successful for moderate sized training corpora, it did not appear to take advantage of significant increases in the size of the corpus. Recently improved computational capabilities at ICSI permitted tests to determine if this was true. Given these considerations, we developed experimental Broadcast News systems that incorporated:

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•10.7916/D80291XN

An Overview of the SPRACH System for the Transcription of Broadcast News

Gary Cook, +9 more

- 01 Jan 1999

TL;DR: This paper describes the SPRACH system, a system based on the connectionist-HMM framework and uses both recurrent neural network and multi-layer perceptron acoustic models, and describes recent developments to CHRONOS, a time-first stack decoder.

...read moreread less

28

•10.5075/EPFL-THESIS-2863

Text detection and recognition in images and video sequences

Datong Chen

- 01 Jan 2003

TL;DR: This thesis investigates methods for building an efficient application system for detecting and recognizing text of any grayscale values embedded in images and video sequences by investigating a two-step localization/verification approach and two schemes are investigated addressing the text recognition problem.

...read moreread less

27

•Dissertation

Vers le temps réel en transcription automatique de la parole grand vocabulaire

Leila Zouari

- 22 Mar 2007

TL;DR: In this paper, the authors propose an approach for partitionnement hierarchique based on the similarite entre the distributions of gaussiennes, and propose a methode de selection contextuelle.

...read moreread less

4

•Dissertation

Modeling spontaneous speech variability for large vocabulary continuous speech recognition

Hauke Schramm, +1 more

- 01 Jan 2006

TL;DR: A theoretical framework for an efficient integration of the class specific acoustic and pronunciation models into a one-pass search procedure is developed which incorporates contributions from class specific alternatives in a weighted sum of acoustic probabilities.

...read moreread less

3

Journal Article•10.1006/CSLA.2001.0167

Applying dynamic context into MLP/HMM speech recognition system

P. Salmela

- 01 Jul 2001

- Computer Speech & Language

TL;DR: When the dynamic context was included in the MLP/HMM recognition system, the string recognition accuracy of the test set increased from 92.9 to 93.8 % on average and the signal-to-noise ratio (SNR) of this test set decreased.

...read moreread less

3

References

Journal Article•10.1121/1.399423

Perceptual linear predictive (PLP) analysis of speech

Hynek Hermansky

- 01 Apr 1990

- Journal of the Acoustical Society of Ame...

TL;DR: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.

...read moreread less

3.1K

Journal Article•10.1006/CSLA.1995.0010

Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models

C. J. Leggetter, +1 more

- 01 Apr 1995

- Computer Speech & Language

TL;DR: An important feature of the method is that arbitrary adaptation data can be used—no special enrolment sentences are needed and that as more data is used the adaptation performance improves.

...read moreread less

2.5K

Journal Article•10.1109/89.326616

RASTA processing of speech

Hynek Hermansky, +1 more

- 01 Oct 1994

- IEEE Transactions on Speech and Audio Pr...

TL;DR: The theoretical and experimental foundations of the RASTA method are reviewed, the relationship with human auditory perception is discussed, the original method is extended to combinations of additive noise and convolutional noise, and an application is shown to speech enhancement.

...read moreread less

2.1K

Journal Article•10.1016/0167-6393(96)00003-9

Towards increasing speech recognition error rates

Hervé Bourlard, +5 more

- 01 May 1996

- Speech Communication

TL;DR: In this article, the authors discuss some research directions for ASR that may not always yield an immediate and guaranteed decrease in error rate but which hold some promise for ultimately improving performance in the end applications, including discrimination between rival utterance models, the role of prior information in speech recognition, merging the language and acoustic models, feature extraction and temporal information, and decoding procedures reflecting human perceptual properties.

...read moreread less

191

•Proceedings Article

Towards increasing speech recognition error rates.

Hervé Bourlard

- 01 Jan 1995

TL;DR: Issues that will be addressed in this paper include: discrimination between rival utterance models, the role of prior information in speech recognition, merging the language and acoustic models, feature extraction and temporal information, and decoding procedures reflecting human perceptual properties.

...read moreread less

122