Proceedings Article10.1109/APSIPAASC47483.2019.9023204
Dynamic-attention based Encoder-decoder model for Speaker Extraction with Anchor speech
Hao Li,Xueliang Zhang,Guanglai Gao +2 more
- 01 Nov 2019
- pp 297-301
4
TL;DR: This work addresses the problem of extracting target speaker from interfering speaker with a short piece of anchor speech which is used to obtain the target speaker identify with a encoder-decoder neural network architecture.
read more
Abstract: Speech plays an important role in human-computer interaction. For many real applications, an annoying problem is that speech is often degraded by interfering noise. Extracting target speech from background interference is a meaningful and challenging task, especially when interference is also human voice. This work addresses the problem of extracting target speaker from interfering speaker with a short piece of anchor speech which is used to obtain the target speaker identify. We propose a encoder-decoder neural network architecture. Specifically, the encoder transforms the anchor speech to a embedding which is used to represent the identity of target speaker. The decoder utilizes the speaker identity to extract the target speech from mixture. To make a acoustic-related speaker identity, The dynamic-attention mechanism is utilized to build a time-varying embedding for each frame of the mixture. Systematic evaluation indicates that our approach improves the quality of speaker extraction.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
3S-TSE: Efficient Three-Stage Target Speaker Extraction for Real-Time and Low-Resource Applications
Shulin He,Jinjiang Liu,Hao Li,Yang-Rui Yang,Fei Chen,Xueliang Zhang +5 more
- 14 Apr 2024
TL;DR: This paper proposes 3S-TSE, a three-stage target speaker extraction method that efficiently isolates a specific voice from multiple speakers using a microphone array, reducing computational load while maintaining superior performance for real-time and low-resource applications.
1
Target Speaker Extraction with Attention Enhancement and Gated Fusion Mechanism
Wang Sijie,Askar Hamdulla,Mijit Ablimit +2 more
- 31 Oct 2023
TL;DR: Improvements of the baseline system is investigated by incorporating the light-weight CBAM module in the target extractor, and the gated fusion module (GFM) in the fusion layer, enabling the model to better leverage the supplementary information provided by speaker embedding.
1
Coarse-to-Fine Target Speaker Extraction Based on Contextual Information Exploitation
Xue Yang,Changchun Bao,Xianhong Chen +2 more
TL;DR: This paper proposes a novel target speaker extraction approach that integrates a coarse-to-fine framework to exploit contextual information in the time-frequency domain, achieving high performance in both specific and universal scenarios with minimal noise and reverberation.
3S-TSE: Efficient Three-Stage Target Speaker Extraction for Real-Time and Low-Resource Applications
Shulin He,Jinjiang Liu,Hao Li,Yang-Rui Yang,Fei Chen,Xueliang Zhang +5 more
TL;DR: This paper addresses the TSE task using microphone array and introduces a novel three-stage solution that systematically decouples the process, setting a new standard for efficient real-time target speaker extraction.
References
•Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
- 01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
138.5K
•Posted Content
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.
82.5K
•Journal Article
Dropout: a simple way to prevent neural networks from overfitting
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
•Proceedings Article
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau,Kyunghyun Cho,Yoshua Bengio +2 more
- 01 Jan 2015
TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
25.7K
•Posted Content
Neural Machine Translation by Jointly Learning to Align and Translate
TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
20.9K