Proceedings Article10.1109/IRANIANCEE.2017.7985272
Multiresolution convolutional neural network for robust speech recognition
Navid Naderi,Babak Nasersharif +1 more
- 02 May 2017
- pp 1459-1464
13
TL;DR: Recognition accuracy on Aurora 2 database, show that MRCNN with two CNNs and corresponding 1×6 and 1×20 convolution filter sizes outperformsCNNs and other MRCnns setting in extracting robust features.
read more
Abstract: Convolutional neural networks (CNNs) have been recently used for acoustic modeling and feature extraction in speech recognition systems, where their inputs have been speech spectrogram or even raw speech signal. In this paper, we propose to use CNN for learning a filter bank and robust feature extraction from the noisy speech spectrum. In the proposed manner, CNN inputs are noisy speech spectrum and its outputs are denoised logarithm of Mel filter bank energies (LMFBs) and convolution filter size is fixed. Furthermore, we propose to use multiple CNNs with different convolution filter sizes to provide different frequency resolutions for feature extraction from the speech spectrum. We named this method as Multiresolution CNN (MRCNN). We behave in two manners with multiple CNNs outputs. In the first manner, we concatenate all outputs to construct the feature vector. In the second manner, we choose some outputs from each CNN based on the convolution filter size and concatenate them to construct feature vector. Recognition accuracy on Aurora 2 database, show that MRCNN with two CNNs and corresponding 1×6 and 1×20 convolution filter sizes outperforms CNNs and other MRCNNs setting in extracting robust features.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
A Deep Neural Network Model for Speaker Identification
TL;DR: A deep neural network (DNN) model based on a two-dimensional convolutional neural network and gated recurrent unit (GRU) for speaker identification is proposed and the experimental results showed that the proposed DNN model, which is called deep GRU, achieved a high recognition accuracy of 98.96%.
97
Multi-Resolution Fully Convolutional Neural Networks for Monaural Audio Source Separation
Emad M. Grais,Hagen Wierstorf,Dominic Ward,Mark D. Plumbley +3 more
- 02 Jul 2018
TL;DR: The proposed MR-FCN is applied to separate the singing voice from mixtures of music sources and improves the performance compared to feedforward deep neural networks (DNNs) and single resolution deep fully convolutional Neural Network (FCNs) on the audio source separation problem.
29
A Convolutional Neural Network model based on Neutrosophy for Noisy Speech Recognition
Elyas Rashno,Ahmad Akbari,Babak Nasersharif +2 more
- 06 Mar 2019
TL;DR: Neutrosophic Convolutional Neural Network (NCNN) as discussed by the authors is proposed for classification task where the speech signals are used as input data and their noise is modeled as uncertainty.
•Posted Content
Multi-Resolution Fully Convolutional Neural Networks for Monaural Audio Source Separation
TL;DR: In this paper, a multi-resolution fully convolutional neural network (MR-FCNN) is proposed to separate a target audio source from a mixture of many audio sources.
15
Advancing Stuttering Detection via Data Augmentation, Class-Balanced Loss and Multi-Contextual Deep Learning
TL;DR: In this paper , a multi-branched training scheme was proposed to address the class imbalance problem in the SD domain via a multibranching (MB) scheme and by weighting the contribution of classes in the overall loss function, resulting in a huge improvement in stuttering classes.
References
•Book
Speech Enhancement: Theory and Practice
Philipos C. Loizou
- 07 Jun 2007
TL;DR: Clear and concise, this book explores how human listeners compensate for acoustic noise in noisy environments and suggests steps that can be taken to realize the full potential of these algorithms under realistic conditions.
2.5K
Convolutional neural networks for speech recognition
TL;DR: It is shown that further error rate reduction can be obtained by using convolutional neural networks (CNNs), and a limited-weight-sharing scheme is proposed that can better model speech features.
Acoustic Modeling Using Deep Belief Networks
TL;DR: It is shown that better phone recognition on the TIMIT dataset can be achieved by replacing Gaussian mixture models by deep neural networks that contain many layers of features and a very large number of parameters.
•Proceedings Article
The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions
David Pearce,Hans-Günter Hirsch +1 more
- 01 Jan 2000
TL;DR: A database designed to evaluate the performance of speech recognition algorithms in noisy conditions and recognition results are presented for the first standard DSR feature extraction scheme that is based on a cepstral analysis.
Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks
Tara N. Sainath,Oriol Vinyals,Andrew W. Senior,Hasim Sak +3 more
- 08 Sep 2015
TL;DR: This paper takes advantage of the complementarity of CNNs, LSTMs and DNNs by combining them into one unified architecture, and finds that the CLDNN provides a 4-6% relative improvement in WER over an LSTM, the strongest of the three individual models.