Polynomial Eigenvalue Decomposition-Based Target Speaker Voice Activity Detection in the Presence of Competing Talkers
05 Sep 2022
TL;DR: In this article , a polynomial eigenvalue decomposition-based target-speaker VAD algorithm was proposed to detect unseen target speakers in the presence of competing talkers.
read more
Abstract: Voice activity detection (VAD) algorithms are essential for many speech processing applications, such as speaker diarization, automatic speech recognition, speech enhancement, and speech coding. With a good VAD algorithm, non-speech segments can be excluded to improve the performance and computation of these applications. In this paper, we propose a polynomial eigenvalue decomposition-based target-speaker VAD algorithm to detect unseen target speakers in the presence of competing talkers. The proposed approach uses frame-based processing across multi-microphones to compute the syndrome energy, used for testing the presence or absence of a target speaker. The proposed approach is consistently among the best in F1 and balanced accuracy scores over the investigated range of signal to interference ratio (SIR) from -10 dB to 20 dB.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Signal Compaction Using Polynomial EVD for Spherical Array Processing with Applications
TL;DR: This work proposes a framework for signal representation that improves the diagonality factor over the microphone signal representation with a significantly lower computation cost, and improves metrics known as short-time objective intelligibility (STOI) and source-to-distortion ratio (SDR) by up to 0.2 and 20 dB, respectively.
8
A Polynomial Subspace Projection Approach for the Detection of Weak Voice Activity
01 Sep 2022
TL;DR: In this article , a polynomial subspace projection pre-processor is proposed to improve the performance of a voice activity detection (VAD) algorithm, which projects the microphone signals onto a lower dimensional subspace to remove the interferer components and thus eases the detection of the speech target.
Support Estimation of Analytic Eigenvectors of Parahermitian Matrices
18 Oct 2022
TL;DR: In this paper , the authors proposed a method to estimate the time-domain support of eigenvectors from parahermitian matrices, which is validated via an ensemble of known support, which the estimated support accurately matches.
Enhanced Space-Time Covariance Estimation Based on a System Identification Approach
01 Sep 2022
TL;DR: In this paper , a significantly more accurate estimate can be obtained if the source signals driving the signal model are also accessible, such that a system identication approach for the source model becomes viable.
Frame-Based Space-Time Covariance Matrix Estimation for Polynomial Eigenvalue Decomposition-Based Speech Enhancement
05 Sep 2022
TL;DR: In this article , a frame-based procedure for the estimation of space-time covariance matrices was proposed, which was found to yield spatial filters and speech enhancement improvements comparable to the batch method in [1], showing potential for real-time processing.
References
•Book
Fundamentals of speech recognition
Lawrence R. Rabiner,Biing-Hwang Juang +1 more
- 01 Jan 1993
TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.
9.4K
Performance measurement in blind audio source separation
TL;DR: This paper considers four different sets of allowed distortions in blind audio source separation algorithms, from time-invariant gains to time-varying filters, and derives a global performance measure using an energy ratio, plus a separate performance measure for each error term.
Classification assessment methods
TL;DR: A detailed overview of the classification assessment measures is introduced with the aim of providing the basics of these measures and to show how it works to serve as a comprehensive source for researchers who are interested in this field.
2K
A statistical model-based voice activity detection
TL;DR: An effective hang-over scheme which considers the previous observations by a first-order Markov process modeling of speech occurrences is proposed which shows significantly better performances than the G.729B VAD in low signal-to-noise ratio (SNR) and vehicular noise environments.
1.4K
Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging
TL;DR: In this article, an improved minima controlled recursive averaging (IMCRA) approach is proposed for noise estimation in adverse environments involving nonstationary noise, weak speech components, and low input signal-to-noise ratio (SNR).