Proceedings Article10.23919/EUSIPCO47968.2020.9287618
Multi-Scale Residual Convolutional Encoder Decoder with Bidirectional Long Short-Term Memory for Single Channel Speech Enhancement
Yang Xian,Yang Sun,Wenwu Wang,Syed Mohsen Naqvi +3 more
- 24 Jan 2021
- pp 431-435
14
TL;DR: In this paper, a multi-scale convolutional bidirectional long short-term memory (BLSTM) recurrent neural network was proposed for end-to-end single channel speech enhancement.
read more
Abstract: The existing convolutional neural network (CNN) based methods still have limitations in model accuracy, latency and computational cost for single channel speech enhancement. In order to address these limitations, we propose a multi-scale convolutional bidirectional long short-term memory (BLSTM) recurrent neural network, which is named as McbNet, a deep learning framework for end-to-end single channel speech enhancement. The proposed McbNet enlarges the receptive fields in two aspects. Firstly, every convolutional layer employs filters with varied dimensions to capture local and global information. Secondly, the BLSTM is applied to evaluate the interdependency of past, current and future temporal frames. The experimental results confirm the proposed McbNet offers consistent improvement over the state-of-the-art methods and public datasets.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning
Xubo Liu,Turab Iqbal,Jinzheng Zhao,Qiushi Huang,Mark D. Plumbley,Wenwu Wang +5 more
- 25 Oct 2021
TL;DR: In this article, the authors proposed a method for generating sounds via neural discrete time-frequency representation learning, conditioned on sound classes, which offers an advantage in efficiently modelling long-range dependencies and retaining local fine-grained structures within sound clips.
23
NSE-CATNet: Deep Neural Speech Enhancement using Convolutional Attention Transformer Network
01 Jan 2023
TL;DR: In this paper , a neural speech enhancement (NSE) using the convolutional encoder-decoder (CED) and Convolutional Attention Transformer (CAT), named the NSE-CATNet, was proposed.
11
Adaptive attention mechanism for single channel speech enhancement
Veeraswamy Parisae,S Nagakishore Bhavanam +1 more
7
Spatio-Temporal Features Representation Using Recurrent Capsules for Monaural Speech Enhancement
Jawad Ali,Nasir Saleem,Sami Bourouis,Eatedal Alabdulkreem,Hela El Mannai,Sami Dhahbi +5 more
TL;DR: This study presents a model for monaural speech enhancement that keeps spatial information in a capsule and uses dynamic routing to pass it to higher layers and the suggested convolutional recurrent CapNet performs better compared to the models based on CNNs and recurrent neural networks.
6
A Subconvolutional U-net with Gated Recurrent Unit and Efficient Channel Attention Mechanism for Real-Time Speech Enhancement
Sivaramakrishna Yechuri,Sunnydayal Vanambathina +1 more
3
References
•Book
Independent Component Analysis
Aapo Hyvärinen,Juha Karhunen,Erkki Oja +2 more
- 18 May 2001
TL;DR: Independent component analysis as mentioned in this paper is a statistical generative model based on sparse coding, which is basically a proper probabilistic formulation of the ideas underpinning sparse coding and can be interpreted as providing a Bayesian prior.
Performance measurement in blind audio source separation
TL;DR: This paper considers four different sets of allowed distortions in blind audio source separation algorithms, from time-invariant gains to time-varying filters, and derives a global performance measure using an energy ratio, plus a separate performance measure for each error term.
•Book
Speech Enhancement: Theory and Practice
Philipos C. Loizou
- 07 Jun 2007
TL;DR: Clear and concise, this book explores how human listeners compensate for acoustic noise in noisy environments and suggests steps that can be taken to realize the full potential of these algorithms under realistic conditions.
2.5K
An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech
TL;DR: A short-time objective intelligibility measure (STOI) is presented, which shows high correlation with the intelligibility of noisy and time-frequency weighted noisy speech (e.g., resulting from noise reduction) of three different listening experiments and showed better correlation with speech intelligibility compared to five other reference objective intelligible models.
2.5K
•Dataset
TIMIT Acoustic-Phonetic Continuous Speech Corpus
John S. Garofolo,Lori Lamel,William M. Fisher,Jonathan C. Fiscus,David S. Pallett,Nancy L. Dahlgren,Victor W. Zue +6 more
- 01 Jan 1993
TL;DR: The TIMIT corpus as mentioned in this paper contains broadband recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences, including time-aligned orthographic, phonetic and word transcriptions as well as a 16-bit, 16kHz speech waveform file for each utterance.
2.4K