Proceedings Article10.1109/CVPR.2000.854730
Multimodal speaker detection using error feedback dynamic Bayesian networks
Vladimir Pavlovic,Ashutosh Garg,James M. Rehg,Thomas S. Huang +3 more
- 01 Jan 2000
- Vol. 2, pp 34-41
TL;DR: This work forms a learning framework for DBNs based on error-feedback and statistical boosting theory and applies this framework to the problem of audio/visual speaker detection in an interactive kiosk environment using "off-the-shelf" visual and audio sensors.
read more
Abstract: Design and development of novel human-computer interfaces poses a challenging problem: actions and intentions of users have to be inferred from sequences of noisy and ambiguous multi-sensory data such as video and sound. Temporal fusion of multiple sensors has been efficiently formulated using dynamic Bayesian networks (DBNs) which allows the power of statistical inference and learning to be combined with contextual knowledge of the problem. Unfortunately simple learning methods can cause such appealing models to fail when the data exhibits complex behavior. We formulate a learning framework for DBNs based on error-feedback and statistical boosting theory. We apply this framework to the problem of audio/visual speaker detection in an interactive kiosk environment using "off-the-shelf" visual and audio sensors (face, skin, texture, mouth motion, and silence detectors). Detection results obtained in this setup demonstrate superiority of our learning framework over that of the classical ML learning in DBNs.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Computer vision and pattern recognition
Nanning Zheng,George Loizou,Xiaoyi Jiang,Xuguang Lan,Xuelong Li +4 more
- 01 Sep 2007
TL;DR: This Special Issue of International Journal of Computer Mathematics (IJCM) offers a venue to present innovative approaches in computer vision and pattern recognition, which have been changing the authors' everyday life dramatically over the last few years, and aims to provide readers with cutting-edge and topical information for their related research.
2.1K
Mobility detection using everyday GSM traces
Timothy Sohn,Alexander Varshavsky,Anthony LaMarca,Mike Y. Chen,Tanzeem Choudhury,Ian Smith,Sunny Consolvo,Jeffrey Hightower,William G. Griswold,Eyal de Lara +9 more
- 17 Sep 2006
TL;DR: This paper explores how coarse-grained GSM data from mobile phones can be used to recognize high-level properties of user mobility, and daily step count, and demonstrates that even without knowledge of observed cell tower locations, mobility modes that are useful for several application domains are recognized.
A probabilistic framework for modeling and real-time monitoring human fatigue
Qiang Ji,P. Lan,Carl G. Looney +2 more
- 01 Sep 2006
TL;DR: A probabilistic framework based on the Bayesian networks for modeling and real-time inferring human fatigue by integrating information from various sensory data and certain relevant contextual information is introduced, leading to a more robust and accurate fatigue modeling and inference.
Audio-visual speaker tracking with importance particle filters
Daniel Gatica-Perez,Guillaume Lathoud,Iain McCowan,Jean-Marc Odobez,Darren Moore +4 more
- 24 Nov 2003
TL;DR: It is shown that imperfect single modalities can be combined into an algorithm that automatically initializes and tracks a speaker, switches between multiple speakers, tolerates visual clutter, and recovers from total AV object occlusion, in the context of a multimodal meeting room.
Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings
TL;DR: In this article, a probabilistic approach is proposed to jointly track the location and speaking activity of multiple speakers in a multisensor meeting room, equipped with a small microphone array and multiple uncalibrated cameras.
References
•Book
Fundamentals of speech recognition
Lawrence R. Rabiner,Biing-Hwang Juang +1 more
- 01 Jan 1993
TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.
9.4K
Neural network-based face detection
TL;DR: A neural network-based upright frontal face detection system that arbitrates between multiple networks to improve performance over a single network, and a straightforward procedure for aligning positive face examples for training.
4.2K
Improved boosting algorithms using confidence-rated predictions
Robert E. Schapire,Yoram Singer +1 more
- 24 Jul 1998
TL;DR: Several improvements to Freund and Schapire’s AdaBoost boosting algorithm are described, particularly in a setting in which hypotheses may assign confidences to each of their predictions.
Neural network-based face detection
Henry Allan Rowley,Shumeet Baluja,Takeo Kanade +2 more
- 18 Jun 1996
TL;DR: A neural network-based face detection system that arbitrates between multiple networks to improve performance over a single network using a bootstrap algorithm, which eliminates the difficult task of manually selecting non-face training examples.
Coupled hidden Markov models for complex action recognition
Matthew Brand,Nuria Oliver,Alex Pentland +2 more
- 17 Jun 1997
TL;DR: Algorithms for coupling and training hidden Markov models (HMMs) to model interacting processes, and demonstrate their superiority to conventional HMMs in a vision task classifying two-handed actions are presented.
1.2K
Related Papers (5)
Arnaud Doucet,Nando de Freitas,Neil Gordon,Adrian F. M. Smith +3 more
- 01 Jan 2001
Robert E. Schapire,Yoram Singer +1 more
- 24 Jul 1998