Multimodal speaker detection using error feedback dynamic Bayesian networks

doi:10.1109/CVPR.2000.854730

Proceedings Article10.1109/CVPR.2000.854730

Multimodal speaker detection using error feedback dynamic Bayesian networks

Vladimir Pavlovic, +3 more

- 01 Jan 2000

- Vol. 2, pp 34-41

70

TL;DR: This work forms a learning framework for DBNs based on error-feedback and statistical boosting theory and applies this framework to the problem of audio/visual speaker detection in an interactive kiosk environment using "off-the-shelf" visual and audio sensors.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.1080/00207160701303912

Computer vision and pattern recognition

Nanning Zheng, +4 more

- 01 Sep 2007

TL;DR: This Special Issue of International Journal of Computer Mathematics (IJCM) offers a venue to present innovative approaches in computer vision and pattern recognition, which have been changing the authors' everyday life dramatically over the last few years, and aims to provide readers with cutting-edge and topical information for their related research.

...read moreread less

2.1K

Book Chapter•10.1007/11853565_13

Mobility detection using everyday GSM traces

Timothy Sohn, +9 more

- 17 Sep 2006

TL;DR: This paper explores how coarse-grained GSM data from mobile phones can be used to recognize high-level properties of user mobility, and daily step count, and demonstrates that even without knowledge of observed cell tower locations, mobility modes that are useful for several application domains are recognized.

...read moreread less

334

Journal Article•10.1109/TSMCA.2005.855922

A probabilistic framework for modeling and real-time monitoring human fatigue

Qiang Ji, +2 more

- 01 Sep 2006

TL;DR: A probabilistic framework based on the Bayesian networks for modeling and real-time inferring human fatigue by integrating information from various sensory data and certain relevant contextual information is introduced, leading to a more robust and accurate fatigue modeling and inference.

...read moreread less

280

•Proceedings Article•10.1109/ICIP.2003.1247172

Audio-visual speaker tracking with importance particle filters

Daniel Gatica-Perez, +4 more

- 24 Nov 2003

TL;DR: It is shown that imperfect single modalities can be combined into an algorithm that automatically initializes and tracks a speaker, switches between multiple speakers, tolerates visual clutter, and recovers from total AV object occlusion, in the context of a multimodal meeting room.

...read moreread less

248

•Journal Article•10.1109/TASL.2006.881678

Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings

Daniel Gatica-Perez, +3 more

- 01 Feb 2007

- IEEE Transactions on Audio, Speech, and ...

TL;DR: In this article, a probabilistic approach is proposed to jointly track the location and speaking activity of multiple speakers in a multisensor meeting room, equipped with a small microphone array and multiple uncalibrated cameras.

...read moreread less

163

...

Expand

References

•Book

Fundamentals of speech recognition

Lawrence R. Rabiner, +1 more

- 01 Jan 1993

TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.

...read moreread less

9.4K

Journal Article•10.1109/34.655647

Neural network-based face detection

Henry Allan Rowley, +2 more

- 01 Jan 1998

- IEEE Transactions on Pattern Analysis an...

TL;DR: A neural network-based upright frontal face detection system that arbitrates between multiple networks to improve performance over a single network, and a straightforward procedure for aligning positive face examples for training.

...read moreread less

4.2K

Proceedings Article•10.1145/279943.279960

Improved boosting algorithms using confidence-rated predictions

Robert E. Schapire, +1 more

- 24 Jul 1998

TL;DR: Several improvements to Freund and Schapire’s AdaBoost boosting algorithm are described, particularly in a setting in which hypotheses may assign confidences to each of their predictions.

...read moreread less

3.3K

Proceedings Article•10.1109/CVPR.1996.517075

Neural network-based face detection

Henry Allan Rowley, +2 more

- 18 Jun 1996

TL;DR: A neural network-based face detection system that arbitrates between multiple networks to improve performance over a single network using a bootstrap algorithm, which eliminates the difficult task of manually selecting non-face training examples.

...read moreread less

2.6K

Proceedings Article•10.1109/CVPR.1997.609450

Coupled hidden Markov models for complex action recognition

Matthew Brand, +2 more

- 17 Jun 1997

TL;DR: Algorithms for coupling and training hidden Markov models (HMMs) to model interacting processes, and demonstrate their superiority to conventional HMMs in a vision task classifying two-handed actions are presented.

...read moreread less

1.2K

...

Expand

Multimodal speaker detection using error feedback dynamic Bayesian networks

Chat with Paper

AI Agents for this Paper

Citations

Computer vision and pattern recognition

Mobility detection using everyday GSM traces

A probabilistic framework for modeling and real-time monitoring human fatigue

Audio-visual speaker tracking with importance particle filters

Audiovisual Probabilistic Tracking of Multiple Speakers in Meetings

References

Fundamentals of speech recognition

Neural network-based face detection

Improved boosting algorithms using confidence-rated predictions

Neural network-based face detection

Coupled hidden Markov models for complex action recognition

Related Papers (5)

Sequential Monte Carlo fusion of sound and vision for speaker tracking

Sequential Monte Carlo methods in practice

Particle filtering algorithms for tracking an acoustic source in a reverberant environment

Learning Joint Statistical Models for Audio-Visual Fusion and Segregation

Improved boosting algorithms using confidence-rated predictions