Open AccessPosted Content
BeamTransformer: Microphone Array-based Overlapping Speech Detection.
Siqi Zheng,Shiliang Zhang,Weilong Huang,Qian Chen,Hongbin Suo,Ming Lei,Jinwei Feng,Zhi-Jie Yan +7 more
TL;DR: In this article, an efficient architecture to leverage beamformer's edge in spatial filtering and transformer's capability in context sequence modeling is proposed to optimize modeling of sequential relationship among signals from different spatial direction.
read more
Abstract: We propose BeamTransformer, an efficient architecture to leverage beamformer's edge in spatial filtering and transformer's capability in context sequence modeling. BeamTransformer seeks to optimize modeling of sequential relationship among signals from different spatial direction. Overlapping speech detection is one of the tasks where such optimization is favorable. In this paper we effectively apply BeamTransformer to detect overlapping segments. Comparing to single-channel approach, BeamTransformer exceeds in learning to identify the relationship among different beam sequences and hence able to make predictions not only from the acoustic signals but also the localization of the source. The results indicate that a successful incorporation of microphone array signals can lead to remarkable gains. Moreover, BeamTransformer takes one step further, as speech from overlapped speakers have been internally separated into different beams.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
References
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation
Yi Luo,Nima Mesgarani +1 more
TL;DR: A fully convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time- domain speech separation, which significantly outperforms previous time–frequency masking methods in separating two- and three-speaker mixtures.
•Book
Microphone Arrays Signal Processing Techniques and Applications
M.S. Brandstein,Darren Ward +1 more
- 01 Jan 2001
TL;DR: This paper presents a meta-modelling architecture for microphone Array Processing that automates the very labor-intensive and therefore time-heavy and expensive process of manually shaping Microphone Arrays for Speech Input in Automobiles.
1.4K
Supervised Speech Separation Based on Deep Learning: An Overview
DeLiang Wang,Jitong Chen +1 more
TL;DR: A comprehensive overview of deep learning-based supervised speech separation can be found in this paper, where three main components of supervised separation are discussed: learning machines, training targets, and acoustic features.
1.4K
On training targets for supervised speech separation
TL;DR: Results in various test conditions reveal that the two ratio mask targets, the IRM and the FFT-MASK, outperform the other targets in terms of objective intelligibility and quality metrics, and that masking based targets, in general, are significantly better than spectral envelope based targets.
Speaker Diarization: A Review of Recent Research
Xavier Anguera Miro,Simon Bozonnet,Nicholas Evans,Corinne Fredouille,Gerald Friedland,Oriol Vinyals +5 more
TL;DR: An analysis of speaker diarization performance as reported through the NIST Rich Transcription evaluations on meeting data and identify important areas for future research are presented.