Open Access10.7916/D8MW2SJ1
Reducing errors by increasing the error rate: MLP Acoustic Modeling for Broadcast News Transcription
Nelson Morgan,Daniel P. W. Ellis,Eric Fosler-Lussier,Adam Janin,Brian Kingsbury +4 more
- 01 Jan 1999
TL;DR: Some aspects of a Broadcast News recognition system based on hybrid HMM/MLP acoustic modeling are described, including the use of novel ‘modulation spectrogram’ features which are combined with conventional models at the posterior probability level, and an investigation of the interaction of model size and training set size for an multilayer perceptron (MLP) acoustic classifier.
read more
Abstract: We describe some aspects of a Broadcast News recognition system based on hybrid HMM/MLP acoustic modeling. These include the use of novel ‘modulation spectrogram’ features which are combined with conventional models at the posterior probability level, some experiments with nonlinear segment normalization, and an investigation of the interaction of model size and training set size for an multilayer perceptron (MLP) acoustic classifier. We also report preliminary results of incorporating gender-dependence into this system. 1. Background In recent years, we and our colleagues have promoted the exploration of novel, poorly understood, but promising approaches to speech recognition [2]. While such deviations from incremental improvements might initially hurt performance, the subset of the new methods that would ultimately prove useful would not be found without such explorations. This past year, we attempted to follow this advice, while still developing a system with reasonable performance on the automatic transcription of Broadcast News speech. An additional goal was finding approaches that would work well in combination with components developed by our SPRACH partners at Cambridge and Sheffield. Finally, previous published results seemed to indicate that, while the hybrid HMM/connectionist approach was successful for moderate sized training corpora, it did not appear to take advantage of significant increases in the size of the corpus. Recently improved computational capabilities at ICSI permitted tests to determine if this was true. Given these considerations, we developed experimental Broadcast News systems that incorporated:
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
An Overview of the SPRACH System for the Transcription of Broadcast News
Gary Cook,James Christie,Daniel P. W. Ellis,Eric Fosler-Lussier,Yoshi Gotoh,Brian Kingsbury,Nelson Morgan,Steve Renals,Tony Robinson,Gethin Williams +9 more
- 01 Jan 1999
TL;DR: This paper describes the SPRACH system, a system based on the connectionist-HMM framework and uses both recurrent neural network and multi-layer perceptron acoustic models, and describes recent developments to CHRONOS, a time-first stack decoder.
Text detection and recognition in images and video sequences
Datong Chen
- 01 Jan 2003
TL;DR: This thesis investigates methods for building an efficient application system for detecting and recognizing text of any grayscale values embedded in images and video sequences by investigating a two-step localization/verification approach and two schemes are investigated addressing the text recognition problem.
27
•Dissertation
Vers le temps réel en transcription automatique de la parole grand vocabulaire
Leila Zouari
- 22 Mar 2007
TL;DR: In this paper, the authors propose an approach for partitionnement hierarchique based on the similarite entre the distributions of gaussiennes, and propose a methode de selection contextuelle.
4
•Dissertation
Modeling spontaneous speech variability for large vocabulary continuous speech recognition
Hauke Schramm,Hermann Ney +1 more
- 01 Jan 2006
TL;DR: A theoretical framework for an efficient integration of the class specific acoustic and pronunciation models into a one-pass search procedure is developed which incorporates contributions from class specific alternatives in a weighted sum of acoustic probabilities.
Applying dynamic context into MLP/HMM speech recognition system
TL;DR: When the dynamic context was included in the MLP/HMM recognition system, the string recognition accuracy of the test set increased from 92.9 to 93.8 % on average and the signal-to-noise ratio (SNR) of this test set decreased.
3
References
Perceptual linear predictive (PLP) analysis of speech
TL;DR: A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, which uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum, and yields a low-dimensional representation of speech.
3.1K
Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
TL;DR: An important feature of the method is that arbitrary adaptation data can be used—no special enrolment sentences are needed and that as more data is used the adaptation performance improves.
2.5K
RASTA processing of speech
Hynek Hermansky,Nelson Morgan +1 more
TL;DR: The theoretical and experimental foundations of the RASTA method are reviewed, the relationship with human auditory perception is discussed, the original method is extended to combinations of additive noise and convolutional noise, and an application is shown to speech enhancement.
2.1K
Towards increasing speech recognition error rates
TL;DR: In this article, the authors discuss some research directions for ASR that may not always yield an immediate and guaranteed decrease in error rate but which hold some promise for ultimately improving performance in the end applications, including discrimination between rival utterance models, the role of prior information in speech recognition, merging the language and acoustic models, feature extraction and temporal information, and decoding procedures reflecting human perceptual properties.
191
•Proceedings Article
Towards increasing speech recognition error rates.
Hervé Bourlard
- 01 Jan 1995
TL;DR: Issues that will be addressed in this paper include: discrimination between rival utterance models, the role of prior information in speech recognition, merging the language and acoustic models, feature extraction and temporal information, and decoding procedures reflecting human perceptual properties.
122