A variational EM algorithm for learning eigenvoice parameters in mixed signals
Ron Weiss,Daniel P. W. Ellis +1 more
- 19 Apr 2009
- pp 113-116
TL;DR: An efficient learning algorithm is derived for model-based source separation for use on single channel speech mixtures where the precise source characteristics are not known a priori.
read more
Abstract: We derive an efficient learning algorithm for model-based source separation for use on single channel speech mixtures where the precise source characteristics are not known a priori. The sources are modeled using factor-analyzed hidden Markov models (HMM) where source specific characteristics are captured by an “eigenvoice” speaker subspace model. The proposed algorithm is able to learn adaptation parameters for two speech sources when only a mixture of signals is observed. We evaluate the algorithm on the 2006 Speech Separation Challenge data set and show that it is significantly faster than our earlier system at a small cost in terms of performance.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Modeling Dynamical Influence in Human Interaction: Using data to make better inferences about influence within social systems
TL;DR: The model can recover known estimates of influence, it generates results that are consistent with other measures of social networks, and it allows us to uncover important shifts in the way states may be transmitted between actors at different points in time.
•Posted Content
Modeling Dynamical Influence in Human Interaction Patterns
TL;DR: The model can recover known estimates of influence, it generates results that are consistent with other measures of social networks, and it allows us to uncover important shifts in the way states may be transmitted between actors at different points in time.
Speaker Agnostic Foreground Speech Detection from Audio Recordings in Workplace Settings from Wearable Recorders
Amrutha Nadarajan,Krishna Somandepalli,Shrikanth S. Narayanan +2 more
- 12 May 2019
TL;DR: A convolutional neural network model is proposed to predict foreground regions using a limited set of audio features and it is shown that these models generalize across the proxy corpora collected in house to approximately match the deployment environment.
15
Model-Based Multiple Pitch Tracking Using Factorial HMMs: Model Adaptation and Inference
Michael Wohlmayr,Franz Pernkopf +1 more
TL;DR: An EM-like iterative adaptation framework which is capable to adapt the model parameters to the specific situation using only speech mixture data is developed and efficient approaches based on observation likelihood pruning are developed.
11
Single channel source separation based on sparse source observation model with harmonic constraint
Tomohiro Nakatani,Shoko Araki +1 more
- 14 Mar 2010
TL;DR: A harmonicity based source separation method is implemented using a robust fundamental frequency (F0) estimation algorithm and the experimental results confirm the effectiveness of the proposed method.
5
References
Factorial Hidden Markov Models
Zoubin Ghahramani,Michael I. Jordan +1 more
- 27 Nov 1995
TL;DR: A generalization of HMMs in which this state is factored into multiple state variables and is therefore represented in a distributed manner, and a structured approximation in which the the state variables are decoupled, yielding a tractable algorithm for learning the parameters of the model.
A Study of Interspeaker Variability in Speaker Verification
TL;DR: It is shown that when a large joint factor analysis model is trained in this way and tested on the core condition, the extended data condition and the cross-channel condition, it is capable of performing at least as well as fusions of multiple systems of other types.
Hidden Markov model decomposition of speech and noise
Andrew Varga,Roger K. Moore +1 more
- 03 Apr 1990
TL;DR: A technique of signal decomposition using hidden Markov models is described that provides an optimal method of decomposing simultaneous processes and has wide implications for signal separation in general and improved speech modeling in particular.
577
•Proceedings Article
Super-Human Multi-Talker Speech Recognition: The IBM 2006 Speech Separation Challenge System
Trausti Kristjansson,John R. Hershey,Peder A. Olsen,Steven J. Rennie,Ramesh A. Gopinath +4 more
- 01 Jan 2006
TL;DR: A system for model based speech separation which achieves super-human recognition performance when two talkers speak at similar levels and incorporates a novel method for performing two-talker speaker identification and gain estimation is described.
127
Speech separation using speaker-adapted eigenvoice speech models
Ron Weiss,Daniel P. W. Ellis +1 more
TL;DR: An algorithm to infer the characteristics of the sources present in a mixture is presented, allowing for significantly improved separation performance over that obtained using unadapted source models.
88