A regression approach to speech enhancement based on deep neural networks

doi:10.1109/TASLP.2014.2364452

Journal Article10.1109/TASLP.2014.2364452

A regression approach to speech enhancement based on deep neural networks

Yong Xu, +3 more

- 01 Jan 2015

- IEEE Transactions on Audio, Speech, and ...

- Vol. 23, Iss: 1, pp 7-19

1.5K

TL;DR: The proposed DNN approach can well suppress highly nonstationary noise, which is tough to handle in general, and is effective in dealing with noisy speech data recorded in real-world scenarios without the generation of the annoying musical artifact commonly observed in conventional enhancement methods.

Abstract: In contrast to the conventional minimum mean square error (MMSE)-based noise reduction techniques, we propose a supervised method to enhance speech by means of finding a mapping function between noisy and clean speech signals based on deep neural networks (DNNs). In order to be able to handle a wide range of additive noises in real-world situations, a large training set that encompasses many possible combinations of speech and noise types, is first designed. A DNN architecture is then employed as a nonlinear regression function to ensure a powerful modeling capability. Several techniques have also been proposed to improve the DNN-based speech enhancement system, including global variance equalization to alleviate the over-smoothing problem of the regression model, and the dropout and noise-aware training strategies to further improve the generalization capability of DNNs to unseen noise conditions. Experimental results demonstrate that the proposed framework can achieve significant improvements in both objective and subjective measures over the conventional MMSE based technique. It is also interesting to observe that the proposed DNN approach can well suppress highly nonstationary noise, which is tough to handle in general. Furthermore, the resulting DNN model, trained with artificial synthesized data, is also effective in dealing with noisy speech data recorded in real-world scenarios without the generation of the annoying musical artifact commonly observed in conventional enhancement methods.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Proceedings Article•10.1109/ICCC47050.2019.9064433

Audio Noise Filter using Cycle Consistent Adversarial Network - CycleGAN ANF

Nam Son Nguyen, +5 more

- 01 Dec 2019

TL;DR: In this paper, the authors proposed CycleGAN ANF, a neural network approach that can learn to reduce both stationary and non-stationary noises, totally unsupervised, by reading in a raw audio sample from a set X (speech mixed with noises) and transforming it so that it sound as if it belongs in set Y (clean speech).

...read moreread less

2

Proceedings Article•10.1109/SIU.2018.8404639

Regression-based speech enhancement by convolutional neural network

Mustafa Erseven, +1 more

- 02 May 2018

TL;DR: A regression-based convolutional neural network model is proposed for speech enhancement to remove the noise on the conversations and the results are evaluated by perceptual evaluation of speech quality and short time objective intelligibility.

...read moreread less

2

•Journal Article•10.1186/s13636-022-00256-5

A speech enhancement algorithm based on a non-negative hidden Markov model and Kullback-Leibler divergence

Yang Xiang, +3 more

- 08 Sep 2022

- Eurasip Journal on Audio, Speech, and Mu...

TL;DR: In this paper , the authors proposed a supervised single-channel speech enhancement method that combines Kullback-Leibler divergence-based non-negative matrix factorization (NMF) and a hidden Markov model (HMM).

...read moreread less

2

Proceedings Article•10.48550/arXiv.2203.11500

Joint Noise Reduction and Listening Enhancement for Full-End Speech Enhancement

Haoyu Li, +2 more

- 22 Mar 2022

TL;DR: In this paper , a deep learning-based joint framework integrating noise reduction (NR) with listening enhancement (LE) is proposed, in which the NR module first suppresses noise and the LE module then modifies the denoised speech to further improve speech intelligibility.

...read moreread less

2

Proceedings Article•10.1109/ICSPCS50536.2020.9310026

Targeted Voice Enhancement by Bandpass Filter and Composite Deep Denoising Autoencoder

Raghad Yaseen Lazim AL-Taai, +2 more

- 14 Dec 2020

TL;DR: In this article, a hybrid system for hearing-aids application, which works to separate the target voice from the noisy signal and then enhance the speech based on the user's hearing loss, is proposed.

...read moreread less

2

...

Expand

References

Journal Article•10.1126/SCIENCE.1127647

Reducing the Dimensionality of Data with Neural Networks

Geoffrey E. Hinton, +1 more

- 28 Jul 2006

- Science

TL;DR: In this article, an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data is described.

...read moreread less

20.9K

Journal Article•10.1162/NECO.2006.18.7.1527

A fast learning algorithm for deep belief nets

Geoffrey E. Hinton, +2 more

- 01 Jul 2006

- Neural Computation

TL;DR: A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.

...read moreread less

18.3K

•Book

Learning Deep Architectures for AI

Yoshua Bengio

- 01 Jan 2009

TL;DR: The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed.

...read moreread less

8.5K

•Posted Content

Improving neural networks by preventing co-adaptation of feature detectors

Geoffrey E. Hinton, +4 more

- 03 Jul 2012

- arXiv: Neural and Evolutionary Computing

TL;DR: The authors randomly omits half of the feature detectors on each training case to prevent complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors.

...read moreread less

8.5K

Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks

Geoffrey E. Hinton, +1 more

- 01 Jan 2006

TL;DR: This work describes an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data.

...read moreread less

7.4K