Proceedings Article10.1109/ICASSP.1999.759777
Speaker adaptation using maximum likelihood model interpolation
Zuoying Wang,Feng Liu +1 more
- 15 Mar 1999
- Vol. 2, pp 753-756
9
TL;DR: Experiments show that 3 adaptation sentences can give a significant performance improvement and as the number of SD models increases, further improvement can be obtained.
read more
Abstract: A speaker adaptation scheme named maximum likelihood model interpolation (MLMI) is proposed. The basic idea of MLMI is to compute the speaker adapted (SA) model of a test speaker by a linear convex combination of a set of speaker dependent (SD) models. Given a set of training speakers, we first calculate the corresponding SD models for each training speaker as well as the speaker-independent (SI) models. Then, the mean vector of the SA model is computed as the weighted sum of the set of the SD mean vectors, while the covariance matrix is the same as that of the SI model. An algorithm to estimate the weight parameters is given which maximizes the likelihood of the SA model given the adaptation data. Experiments show that 3 adaptation sentences can give a significant performance improvement. As the number of SD models increases, further improvement can be obtained.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Maximum likelihood stochastic transformation adaptation for medium and small data sets
TL;DR: In this article, the authors proposed the maximum likelihood stochastic transformation (MLST) for speaker adaptation, which estimates multiple linear transforms per class of models and a transform weights vector specific to each component (Gaussians in our case).
8
Multigrained Model Adaptation With Map and Reference Speaker Weighting For Text Independent Speaker Verification
Xianyu Zhao,Yuan Dong,Jun Luo,Hao Yang,Haila Wang +4 more
- 14 May 2006
TL;DR: A new speaker adaptation method which combines MAP and reference speaker weighting (RSW) adaptation in a hierarchical, multigrained mode is presented, which enables all model components to be updated in a way that strikes a good balance between model complexity and available data.
4
•Proceedings Article
Using spatial correlation information in speech recognition.
Peng Yu,Zuoying Wang +1 more
- 01 Jan 2001
TL;DR: A new method of using spatial information in speech recognition is proposed by using linear equation to subscribe spatial correlation, calculating equation coefficients by K-L transformation, and developing a new training algorithm with the linear constraints.
2
Patent
Method of creating an acoustic model for a speech recognition system
Bartosik Heinrich
- 01 Jul 2004
TL;DR: In this article, a weighted linear combination of the initial models (Gi) is created using previously determined weight factors (gi) that are specific to the acoustic model (U) for a speaker as the user of a voice recognition system.
2
•Proceedings Article
Linguistic tree based maximum likelihood model interpolation.
Liu Feng,Chiwei Che,Peng Yu,Zuoying Wang +3 more
- 01 Jan 1999
TL;DR: A speaker adaptation method is presented which computes the speaker adapted model by a weighted sum of a set of speaker dependent models which shows that with as little as 1~3 sentences a significant performance improvement is obtained.
References
Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
TL;DR: An important feature of the method is that arbitrary adaptation data can be used—no special enrolment sentences are needed and that as more data is used the adaptation performance improves.
2.5K
Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains
Jean-Luc Gauvain,Chin-Hui Lee +1 more
TL;DR: A framework for maximum a posteriori (MAP) estimation of hidden Markov models (HMM) is presented, and Bayesian learning is shown to serve as a unified approach for a wide range of speech recognition applications.
2.5K
Voice dictation of Mandarin Chinese
Lin-Shan Lee
- 01 Jul 1997
TL;DR: The characteristic structure of Mandarin Chinese is analyzed and the primary focus is on the key technology regarding the problem, including the basic architecture for Mandarin dictation, acoustic modeling/ processing, and linguistic modeling/processing.
79
•Proceedings Article
Speaker adaptation based on pre-clustering training speakers.
Yuqing Gao,Mukund Padmanabhan,Michael Picheny +2 more
- 01 Jan 1997
42
Speaker adaptation based on pre-clustering training speakers
Yuqing Gao,M. Padmanabhan,Michael Picheny +2 more
- 22 Sep 1997
33