BeamTransformer: Microphone Array-based Overlapping Speech Detection.

Open AccessPosted Content

BeamTransformer: Microphone Array-based Overlapping Speech Detection.

- 09 Sep 2021

TL;DR: In this article, an efficient architecture to leverage beamformer's edge in spatial filtering and transformer's capability in context sequence modeling is proposed to optimize modeling of sequential relationship among signals from different spatial direction.

Abstract: We propose BeamTransformer, an efficient architecture to leverage beamformer's edge in spatial filtering and transformer's capability in context sequence modeling. BeamTransformer seeks to optimize modeling of sequential relationship among signals from different spatial direction. Overlapping speech detection is one of the tasks where such optimization is favorable. In this paper we effectively apply BeamTransformer to detect overlapping segments. Comparing to single-channel approach, BeamTransformer exceeds in learning to identify the relationship among different beam sequences and hence able to make predictions not only from the acoustic signals but also the localization of the source. The results indicate that a successful incorporation of microphone array signals can lead to remarkable gains. Moreover, BeamTransformer takes one step further, as speech from overlapped speakers have been internally separated into different beams.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

References

•Journal Article•10.1109/TASLP.2019.2915167

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation

Yi Luo, +1 more

- 20 Sep 2018

- arXiv: Sound

TL;DR: A fully convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time- domain speech separation, which significantly outperforms previous time–frequency masking methods in separating two- and three-speaker mixtures.

...read moreread less

2K

•Book

Microphone Arrays Signal Processing Techniques and Applications

M.S. Brandstein, +1 more

- 01 Jan 2001

TL;DR: This paper presents a meta-modelling architecture for microphone Array Processing that automates the very labor-intensive and therefore time-heavy and expensive process of manually shaping Microphone Arrays for Speech Input in Automobiles.

...read moreread less

1.4K

•Journal Article•10.1109/TASLP.2018.2842159

Supervised Speech Separation Based on Deep Learning: An Overview

DeLiang Wang, +1 more

- 01 Oct 2018

- IEEE Transactions on Audio, Speech, and ...

TL;DR: A comprehensive overview of deep learning-based supervised speech separation can be found in this paper, where three main components of supervised separation are discussed: learning machines, training targets, and acoustic features.

...read moreread less

1.4K

•Journal Article•10.1109/TASLP.2014.2352935

On training targets for supervised speech separation

Yuxuan Wang, +2 more

- 01 Dec 2014

- IEEE Transactions on Audio, Speech, and ...

TL;DR: Results in various test conditions reveal that the two ratio mask targets, the IRM and the FFT-MASK, outperform the other targets in terms of objective intelligibility and quality metrics, and that masking based targets, in general, are significantly better than spectral envelope based targets.

...read moreread less

1.2K

•Journal Article•10.1109/TASL.2011.2125954

Speaker Diarization: A Review of Recent Research

Xavier Anguera Miro, +5 more

- 01 Feb 2012

- IEEE Transactions on Audio, Speech, and ...

TL;DR: An analysis of speaker diarization performance as reported through the NIST Rich Transcription evaluations on meeting data and identify important areas for future research are presented.

...read moreread less

848

...

Expand

BeamTransformer: Microphone Array-based Overlapping Speech Detection.

Chat with Paper

AI Agents for this Paper

References

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation

Microphone Arrays Signal Processing Techniques and Applications

Supervised Speech Separation Based on Deep Learning: An Overview

On training targets for supervised speech separation

Speaker Diarization: A Review of Recent Research

Related Papers (5)

Continuous Speech Separation with Ad Hoc Microphone Arrays.

Single-channel speech separation based on instantaneous frequency

Deep ad-hoc beamforming

End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation

Robust Multi-Channel Speech Recognition Using Frequency Aligned Network