Proceedings Article10.1109/IWAENC53105.2022.9914796
Polynomial Eigenvalue Decomposition-Based Target Speaker Voice Activity Detection in the Presence of Competing Talkers
Vincent W. Neo,Stephan Weiss,Simon W. McKnight,Aidan O. T. Hogg,Patrick A. Naylor +4 more
- 05 Sep 2022
pp 1-5
9
TL;DR: A polynomial eigenvalue decomposition-based target-speaker VAD algorithm to detect unseen target speakers in the presence of competing talkers and is consistently among the best in F1 and balanced accuracy scores over the investigated range of signal to interference ratio (SIR).
read more
Abstract: Voice activity detection (VAD) algorithms are essential for many speech processing applications, such as speaker diarization, automatic speech recognition, speech enhancement, and speech coding. With a good VAD algorithm, non-speech segments can be excluded to improve the performance and computation of these applications. In this paper, we propose a polynomial eigenvalue decomposition-based target-speaker VAD algorithm to detect unseen target speakers in the presence of competing talkers. The proposed approach uses frame-based processing across multi-microphones to compute the syndrome energy, used for testing the presence or absence of a target speaker. The proposed approach is consistently among the best in F1 and balanced accuracy scores over the investigated range of signal to interference ratio (SIR) from -10 dB to 20 dB.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Signal Compaction Using Polynomial EVD for Spherical Array Processing with Applications
TL;DR: This work proposes a framework for signal representation that improves the diagonality factor over the microphone signal representation with a significantly lower computation cost, and improves metrics known as short-time objective intelligibility (STOI) and source-to-distortion ratio (SDR) by up to 0.2 and 20 dB, respectively.
8
A Polynomial Subspace Projection Approach for the Detection of Weak Voice Activity
Vincent W. Neo,Stephan Weiss,Patrick A. Naylor +2 more
- 01 Sep 2022
TL;DR: A polynomial subspace projection pre-processor is proposed to improve an existing VAD algorithm and projects the microphone signals onto a lower dimensional subspace which attempts to remove the interferer components and thus eases the detection of the speech target.
7
Support Estimation of Analytic Eigenvectors of Parahermitian Matrices
Faizan Khattak,Ian K. Proudler,Stephan Weiss +2 more
- 18 Oct 2022
TL;DR: In this article , the authors proposed a method to estimate the time-domain support of eigenvectors in the discrete Fourier transform (DFT) domain for analytic eigenvector extraction from parahermitian matrices.
6
Enhanced Space-Time Covariance Estimation Based on a System Identification Approach
Faizan Ahmad Khattak,Ian K. Proudler,S. Weis +2 more
- 01 Sep 2022
TL;DR: In this paper , a significantly more accurate estimate can be obtained if the source signals driving the signal model are also accessible, such that a system identication approach for the source model becomes viable.
2
Polynomial power method: An extension of the standard power method to para-Hermitian matrices
Faizan Khattak,Ian K. Proudler,Stephan Weiss +2 more
TL;DR: This paper extends the power method to polynomial para-Hermitian matrices, extracting the principal analytic eigenpair through repeated polynomial multiplication and normalization, outperforming existing algorithms on randomly generated para-Hermitian matrices.
1
References
Performance measurement in blind audio source separation
TL;DR: This paper considers four different sets of allowed distortions in blind audio source separation algorithms, from time-invariant gains to time-varying filters, and derives a global performance measure using an energy ratio, plus a separate performance measure for each error term.
Classification assessment methods
TL;DR: A detailed overview of the classification assessment measures is introduced with the aim of providing the basics of these measures and to show how it works to serve as a comprehensive source for researchers who are interested in this field.
2K
A statistical model-based voice activity detection
TL;DR: An effective hang-over scheme which considers the previous observations by a first-order Markov process modeling of speech occurrences is proposed which shows significantly better performances than the G.729B VAD in low signal-to-noise ratio (SNR) and vehicular noise environments.
1.4K
The voice bank corpus: Design, collection and data analysis of a large regional accent speech database
Christophe Veaux,Junichi Yamagishi,Simon King +2 more
- 01 Nov 2013
TL;DR: The motivation and the processes involved in the design and recording of the Voice Bank corpus, specifically designed for the creation of personalised synthetic voices for individuals with speech disorders, are described.
470
GSVD-based optimal filtering for single and multimicrophone speech enhancement
Simon Doclo,Marc Moonen +1 more
TL;DR: Simulations show that the GSVD-based optimal filtering technique has a better performance than standard fixed and adaptive beamforming techniques for all reverberation times and that it is more robust to deviations from the nominal situation, as, e.g., encountered in uncalibrated microphone arrays.
415