Deep Learning Assisted Time-Frequency Processing for Speech Enhancement on Drones

doi:10.1109/TETCI.2020.3014934

Open AccessJournal Article10.1109/TETCI.2020.3014934

Deep Learning Assisted Time-Frequency Processing for Speech Enhancement on Drones

Lin Wang, +1 more

- 24 Aug 2020

- pp 1-11

37

TL;DR: This article fills the gap between the growing interest in signal processing based on Deep Neural Networks (DNN) and the new application of enhancing speech captured by microphones on a drone, and presents the first work that integrates single-channel and multi-channel DNN-based approaches for speech enhancement on drones.

Abstract: This article fills the gap between the growing interest in signal processing based on Deep Neural Networks (DNN) and the new application of enhancing speech captured by microphones on a drone. In this context, the quality of the target sound is degraded significantly by the strong ego-noise from the rotating motors and propellers. We present the first work that integrates single-channel and multi-channel DNN-based approaches for speech enhancement on drones. We employ a DNN to estimate the ideal ratio masks at individual time-frequency bins, which are subsequently used to design three potential speech enhancement systems, namely single-channel ego-noise reduction (DNN-S), multi-channel beamforming (DNN-BF), and multi-channel time-frequency spatial filtering (DNN-TF). The main novelty lies in the proposed DNN-TF algorithm, which infers the noise-dominance probabilities at individual time-frequency bins from the DNN-estimated soft masks, and then incorporates them into a time-frequency spatial filtering framework for ego-noise reduction. By jointly exploiting the direction of arrival of the target sound, the time-frequency sparsity of the acoustic signals (speech and ego-noise) and the time-frequency noise-dominance probability, DNN-TF can suppress the ego-noise effectively in scenarios with very low signal-to-noise ratios (e.g. SNR lower than −15 dB), especially when the direction of the target sound is close to that of a source of the ego-noise. Experiments with real and simulated data show the advantage of DNN-TF over competing methods, including DNN-S, DNN-BF and the state-of-the-art time-frequency spatial filtering.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1109/TASLP.2020.3015027

A Blind Source Separation Framework for Ego-Noise Reduction on Multi-Rotor Drones

Lin Wang, +1 more

- 07 Aug 2020

- IEEE Transactions on Audio, Speech, and ...

TL;DR: A pre-processing algorithm which uses time-frequency spatial filtering (TFS) to generate a reference to pre-align the permutation not only improves the performance of clustering and permutation alignment, but also solves the target-channel selection problem for BSS.

...read moreread less

32

Journal Article•10.1145/3570955

Transforming Large-Size to Lightweight Deep Neural Networks for IoT Applications

Rahul Mishra, +1 more

- 09 Feb 2023

- ACM Computing Surveys

TL;DR: A comprehensive overview of existing literature on compressing the DNN that reduces energy consumption, storage, and computation requirements for IoT applications is presented in this article , where the authors divide the existing approaches into five broad categories: network pruning, sparse representation, bits precision, knowledge distillation, and miscellaneous.

...read moreread less

27

•Journal Article•10.1109/access.2023.3253719

Deep Learning Models for Single-Channel Speech Enhancement on Drones

01 Jan 2023

- IEEE Access

TL;DR: In this article , the authors train twelve representative deep neural network (DNN) models, covering three operation domains (time-frequency magnitude domain, time-frequency complex domain and end-to-end time domain) and three distinct architectures (sequential, encoder-decoder and generative).

...read moreread less

21

Journal Article•10.1016/j.apacoust.2021.108590

Multi-sensory sound source enhancement for unmanned aerial vehicle recordings

Benjamin Yen, +3 more

- 01 Feb 2022

- Applied Acoustics

TL;DR: In this article , a method to effectively perform sound source enhancement from an unmanned aerial vehicle (UAV)-mounted audio recording system is proposed, which uses audio recordings and non-acoustical UAV rotor characteristics to improve rotor noise power spectral density estimation accuracy and robustness.

...read moreread less

15

•Journal Article•10.1109/jsen.2022.3207660

Deep-Learning-Assisted Sound Source Localization From a Flying Drone

01 Nov 2022

- IEEE sensors journal

TL;DR: In this article , a deep learning-based framework that integrates single-channel noise reduction and multichannel source localization is proposed to suppress the ego-noise from rotating motors and propellers as well as the movement of the drone and the sound sources.

...read moreread less

12

...

Expand

References

Journal Article•10.1121/1.382599

Image method for efficiently simulating small‐room acoustics

Jont B. Allen, +1 more

- 01 Nov 1976

- Journal of the Acoustical Society of Ame...

TL;DR: The theoretical and practical use of image techniques for simulating the impulse response between two points in a small rectangular room, when convolved with any desired input signal, simulates room reverberation of the input signal.

...read moreread less

4.2K

Journal Article•10.1109/TASL.2007.911054

Evaluation of Objective Quality Measures for Speech Enhancement

Yi Hu, +1 more

- 01 Jan 2008

- IEEE Transactions on Audio, Speech, and ...

TL;DR: The evaluation of correlations of several objective measures with these three subjective rating scales is reported on and several new composite objective measures are also proposed by combining the individual objective measures using nonparametric and parametric regression analysis techniques.

...read moreread less

1.9K

Journal Article•10.1109/TASLP.2014.2364452

A regression approach to speech enhancement based on deep neural networks

Yong Xu, +3 more

- 01 Jan 2015

- IEEE Transactions on Audio, Speech, and ...

TL;DR: The proposed DNN approach can well suppress highly nonstationary noise, which is tough to handle in general, and is effective in dealing with noisy speech data recorded in real-world scenarios without the generation of the annoying musical artifact commonly observed in conventional enhancement methods.

...read moreread less

1.5K

•Journal Article•10.1109/TASLP.2018.2842159

Supervised Speech Separation Based on Deep Learning: An Overview

DeLiang Wang, +1 more

- 01 Oct 2018

- IEEE Transactions on Audio, Speech, and ...

TL;DR: A comprehensive overview of deep learning-based supervised speech separation can be found in this paper, where three main components of supervised separation are discussed: learning machines, training targets, and acoustic features.

...read moreread less

1.4K

•Journal Article•10.1109/TASLP.2014.2352935

On training targets for supervised speech separation

Yuxuan Wang, +2 more

- 01 Dec 2014

- IEEE Transactions on Audio, Speech, and ...

TL;DR: Results in various test conditions reveal that the two ratio mask targets, the IRM and the FFT-MASK, outperform the other targets in terms of objective intelligibility and quality metrics, and that masking based targets, in general, are significantly better than spectral envelope based targets.

...read moreread less

1.2K