Multi-Scale Residual Convolutional Encoder Decoder with Bidirectional Long Short-Term Memory for Single Channel Speech Enhancement

doi:10.23919/EUSIPCO47968.2020.9287618

Proceedings Article10.23919/EUSIPCO47968.2020.9287618

Multi-Scale Residual Convolutional Encoder Decoder with Bidirectional Long Short-Term Memory for Single Channel Speech Enhancement

Yang Xian, +3 more

- 24 Jan 2021

- pp 431-435

14

TL;DR: In this paper, a multi-scale convolutional bidirectional long short-term memory (BLSTM) recurrent neural network was proposed for end-to-end single channel speech enhancement.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.1109/MLSP52302.2021.9596430

Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning

Xubo Liu, +5 more

- 25 Oct 2021

TL;DR: In this article, the authors proposed a method for generating sounds via neural discrete time-frequency representation learning, conditioned on sound classes, which offers an advantage in efficiently modelling long-range dependencies and retaining local fine-grained structures within sound clips.

...read moreread less

23

•Journal Article•10.1109/access.2023.3290908

NSE-CATNet: Deep Neural Speech Enhancement using Convolutional Attention Transformer Network

01 Jan 2023

- IEEE Access

TL;DR: In this paper , a neural speech enhancement (NSE) using the convolutional encoder-decoder (CED) and Convolutional Attention Transformer (CAT), named the NSE-CATNet, was proposed.

...read moreread less

11

Journal Article•10.1007/s11042-024-19076-0

Adaptive attention mechanism for single channel speech enhancement

Veeraswamy Parisae, +1 more

- 04 Apr 2024

- Multimedia Tools and Applications

7

Journal Article•10.1109/access.2024.3361286

Spatio-Temporal Features Representation Using Recurrent Capsules for Monaural Speech Enhancement

Jawad Ali, +5 more

- IEEE Access

TL;DR: This study presents a model for monaural speech enhancement that keeps spatial information in a capsule and uses dynamic routing to pass it to higher layers and the suggested convolutional recurrent CapNet performs better compared to the models based on CNNs and recurrent neural networks.

...read moreread less

6

Journal Article•10.1007/s11277-024-10874-1

A Subconvolutional U-net with Gated Recurrent Unit and Efficient Channel Attention Mechanism for Real-Time Speech Enhancement

Sivaramakrishna Yechuri, +1 more

- 04 Mar 2024

- Wireless Personal Communications

3

References

•Book

Independent Component Analysis

Aapo Hyvärinen, +2 more

- 18 May 2001

TL;DR: Independent component analysis as mentioned in this paper is a statistical generative model based on sparse coding, which is basically a proper probabilistic formulation of the ideas underpinning sparse coding and can be interpreted as providing a Bayesian prior.

...read moreread less

8.4K

•Journal Article•10.1109/TSA.2005.858005

Performance measurement in blind audio source separation

Emmanuel Vincent, +2 more

- 01 Jul 2006

- IEEE Transactions on Audio, Speech, and ...

TL;DR: This paper considers four different sets of allowed distortions in blind audio source separation algorithms, from time-invariant gains to time-varying filters, and derives a global performance measure using an energy ratio, plus a separate performance measure for each error term.

...read moreread less

3.4K

•Book

Speech Enhancement: Theory and Practice

Philipos C. Loizou

- 07 Jun 2007

TL;DR: Clear and concise, this book explores how human listeners compensate for acoustic noise in noisy environments and suggests steps that can be taken to realize the full potential of these algorithms under realistic conditions.

...read moreread less

2.5K

Journal Article•10.1109/TASL.2011.2114881

An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech

Cees H. Taal, +3 more

- 01 Sep 2011

- IEEE Transactions on Audio, Speech, and ...

TL;DR: A short-time objective intelligibility measure (STOI) is presented, which shows high correlation with the intelligibility of noisy and time-frequency weighted noisy speech (e.g., resulting from noise reduction) of three different listening experiments and showed better correlation with speech intelligibility compared to five other reference objective intelligible models.

...read moreread less

2.5K

•Dataset

TIMIT Acoustic-Phonetic Continuous Speech Corpus

John S. Garofolo, +6 more

- 01 Jan 1993

TL;DR: The TIMIT corpus as mentioned in this paper contains broadband recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences, including time-aligned orthographic, phonetic and word transcriptions as well as a 16-bit, 16kHz speech waveform file for each utterance.

...read moreread less

2.4K