Deep Learning Assisted Time-Frequency Processing for Speech Enhancement on Drones
Lin Wang,Andrea Cavallaro +1 more
- 24 Aug 2020
- pp 1-11
TL;DR: This article fills the gap between the growing interest in signal processing based on Deep Neural Networks (DNN) and the new application of enhancing speech captured by microphones on a drone, and presents the first work that integrates single-channel and multi-channel DNN-based approaches for speech enhancement on drones.
read more
Abstract: This article fills the gap between the growing interest in signal processing based on Deep Neural Networks (DNN) and the new application of enhancing speech captured by microphones on a drone. In this context, the quality of the target sound is degraded significantly by the strong ego-noise from the rotating motors and propellers. We present the first work that integrates single-channel and multi-channel DNN-based approaches for speech enhancement on drones. We employ a DNN to estimate the ideal ratio masks at individual time-frequency bins, which are subsequently used to design three potential speech enhancement systems, namely single-channel ego-noise reduction (DNN-S), multi-channel beamforming (DNN-BF), and multi-channel time-frequency spatial filtering (DNN-TF). The main novelty lies in the proposed DNN-TF algorithm, which infers the noise-dominance probabilities at individual time-frequency bins from the DNN-estimated soft masks, and then incorporates them into a time-frequency spatial filtering framework for ego-noise reduction. By jointly exploiting the direction of arrival of the target sound, the time-frequency sparsity of the acoustic signals (speech and ego-noise) and the time-frequency noise-dominance probability, DNN-TF can suppress the ego-noise effectively in scenarios with very low signal-to-noise ratios (e.g. SNR lower than −15 dB), especially when the direction of the target sound is close to that of a source of the ego-noise. Experiments with real and simulated data show the advantage of DNN-TF over competing methods, including DNN-S, DNN-BF and the state-of-the-art time-frequency spatial filtering.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A Blind Source Separation Framework for Ego-Noise Reduction on Multi-Rotor Drones
Lin Wang,Andrea Cavallaro +1 more
TL;DR: A pre-processing algorithm which uses time-frequency spatial filtering (TFS) to generate a reference to pre-align the permutation not only improves the performance of clustering and permutation alignment, but also solves the target-channel selection problem for BSS.
Transforming Large-Size to Lightweight Deep Neural Networks for IoT Applications
Rahul Mishra,Hari Prabhat Gupta +1 more
TL;DR: A comprehensive overview of existing literature on compressing the DNN that reduces energy consumption, storage, and computation requirements for IoT applications is presented in this article , where the authors divide the existing approaches into five broad categories: network pruning, sparse representation, bits precision, knowledge distillation, and miscellaneous.
27
Deep Learning Models for Single-Channel Speech Enhancement on Drones
01 Jan 2023
TL;DR: In this article , the authors train twelve representative deep neural network (DNN) models, covering three operation domains (time-frequency magnitude domain, time-frequency complex domain and end-to-end time domain) and three distinct architectures (sequential, encoder-decoder and generative).
Multi-sensory sound source enhancement for unmanned aerial vehicle recordings
TL;DR: In this article , a method to effectively perform sound source enhancement from an unmanned aerial vehicle (UAV)-mounted audio recording system is proposed, which uses audio recordings and non-acoustical UAV rotor characteristics to improve rotor noise power spectral density estimation accuracy and robustness.
15
Deep-Learning-Assisted Sound Source Localization From a Flying Drone
01 Nov 2022
TL;DR: In this article , a deep learning-based framework that integrates single-channel noise reduction and multichannel source localization is proposed to suppress the ego-noise from rotating motors and propellers as well as the movement of the drone and the sound sources.
References
Image method for efficiently simulating small‐room acoustics
Jont B. Allen,David A. Berkley +1 more
TL;DR: The theoretical and practical use of image techniques for simulating the impulse response between two points in a small rectangular room, when convolved with any desired input signal, simulates room reverberation of the input signal.
Evaluation of Objective Quality Measures for Speech Enhancement
Yi Hu,Philipos C. Loizou +1 more
TL;DR: The evaluation of correlations of several objective measures with these three subjective rating scales is reported on and several new composite objective measures are also proposed by combining the individual objective measures using nonparametric and parametric regression analysis techniques.
1.9K
A regression approach to speech enhancement based on deep neural networks
TL;DR: The proposed DNN approach can well suppress highly nonstationary noise, which is tough to handle in general, and is effective in dealing with noisy speech data recorded in real-world scenarios without the generation of the annoying musical artifact commonly observed in conventional enhancement methods.
1.5K
Supervised Speech Separation Based on Deep Learning: An Overview
DeLiang Wang,Jitong Chen +1 more
TL;DR: A comprehensive overview of deep learning-based supervised speech separation can be found in this paper, where three main components of supervised separation are discussed: learning machines, training targets, and acoustic features.
1.4K
On training targets for supervised speech separation
TL;DR: Results in various test conditions reveal that the two ratio mask targets, the IRM and the FFT-MASK, outperform the other targets in terms of objective intelligibility and quality metrics, and that masking based targets, in general, are significantly better than spectral envelope based targets.