Dynamic-attention based Encoder-decoder model for Speaker Extraction with Anchor speech

doi:10.1109/APSIPAASC47483.2019.9023204

Proceedings Article10.1109/APSIPAASC47483.2019.9023204

Dynamic-attention based Encoder-decoder model for Speaker Extraction with Anchor speech

Hao Li, +2 more

- 01 Nov 2019

- pp 297-301

4

TL;DR: This work addresses the problem of extracting target speaker from interfering speaker with a short piece of anchor speech which is used to obtain the target speaker identify with a encoder-decoder neural network architecture.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Proceedings Article•10.1109/icassp48485.2024.10448000

3S-TSE: Efficient Three-Stage Target Speaker Extraction for Real-Time and Low-Resource Applications

Shulin He, +5 more

- 14 Apr 2024

TL;DR: This paper proposes 3S-TSE, a three-stage target speaker extraction method that efficiently isolates a specific voice from multiple speakers using a microphone array, reducing computational load while maintaining superior performance for real-time and low-resource applications.

...read moreread less

1

Proceedings Article•10.1109/apsipaasc58517.2023.10317106

Target Speaker Extraction with Attention Enhancement and Gated Fusion Mechanism

Wang Sijie, +2 more

- 31 Oct 2023

TL;DR: Improvements of the baseline system is investigated by incorporating the light-weight CBAM module in the target extractor, and the gated fusion module (GFM) in the fusion layer, enabling the model to better leverage the supplementary information provided by speaker embedding.

...read moreread less

1

Journal Article•10.1109/taslp.2024.3440638

Coarse-to-Fine Target Speaker Extraction Based on Contextual Information Exploitation

Xue Yang, +2 more

- 01 Jan 2024

- IEEE/ACM transactions on audio, speech, ...

TL;DR: This paper proposes a novel target speaker extraction approach that integrates a coarse-to-fine framework to exploit contextual information in the time-frequency domain, achieving high performance in both specific and universal scenarios with minimal noise and reverberation.

...read moreread less

Journal Article•10.48550/arxiv.2312.10979

3S-TSE: Efficient Three-Stage Target Speaker Extraction for Real-Time and Low-Resource Applications

Shulin He, +5 more

- 18 Dec 2023

- arXiv.org

TL;DR: This paper addresses the TSE task using microphone array and introduces a novel three-stage solution that systematically decouples the process, setting a new standard for efficient real-time target speaker extraction.

...read moreread less

References

•Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

138.5K

•Posted Content

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 22 Dec 2014

- arXiv: Learning

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.

...read moreread less

82.5K

•Journal Article

Dropout: a simple way to prevent neural networks from overfitting

Nitish Srivastava, +4 more

- 01 Jan 2014

- Journal of Machine Learning Research

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less

43.7K

•Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

- 01 Jan 2015

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

25.7K

•Posted Content

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

- 01 Sep 2014

- arXiv: Computation and Language

TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

20.9K