Journal Article10.1016/J.SPECOM.2015.07.003
Deep feature for text-dependent speaker verification
209
TL;DR: Experiments showed that deep feature based methods can obtain significant performance improvements compared to the traditional baselines, no matter if they are directly applied in the GMM-UBM system or utilized as identity vectors.
read more
About: This article is published in Speech Communication. The article was published on 01 Oct 2015. The article focuses on the topics: Deep belief network & Deep learning.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Speaker recognition based on deep learning: An overview
Zhongxin Bai,Xiao-Lei Zhang +1 more
TL;DR: In this article, the authors review several major subtasks of speaker recognition, including speaker verification, identification, diarization, and robust speaker recognition with a focus on deep learning-based methods.
275
•Posted Content
Speaker Recognition Based on Deep Learning: An Overview
Zhongxin Bai,Xiao-Lei Zhang +1 more
TL;DR: Several major subtasks of speaker recognition are reviewed, including speaker verification, identification, diarization, and robust speaker recognition, with a focus on deep-learning-based methods.
229
Angular Softmax for Short-Duration Text-independent Speaker Verification.
Zili Huang,Shuai Wang,Kai Yu +2 more
- 02 Sep 2018
TL;DR: In this article, the angular softmax (A-softmax) loss is introduced to improve speaker embedding quality in an end-to-end speaker verification system, where deep discriminant analysis is used for channel compensation.
123
A Review on Human-Computer Interaction and Intelligent Robots
Fuji Ren,Fuji Ren,Yanwei Bao +2 more
TL;DR: This research highlights the existing technologies of listening, speaking, reading, writing, and other senses, which are widely used in human interaction, and introduces some intelligent robot systems and platforms.
Past review, current progress, and challenges ahead on the cocktail party problem
TL;DR: This overview paper focuses on the speech separation problem given its central role in the cocktail party environment, and describes the conventional single-channel techniques such as computational auditory scene analysis (CASA), non-negative matrix factorization (NMF) and generative models, and the newly developed deep learning-based techniques.
106
References
A fast learning algorithm for deep belief nets
TL;DR: A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
Geoffrey E. Hinton,Li Deng,Dong Yu,George E. Dahl,Abdelrahman Mohamed,Navdeep Jaitly,Andrew W. Senior,Vincent Vanhoucke,Patrick Nguyen,Tara N. Sainath,Brian Kingsbury +10 more
TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
11.4K
Speaker Verification Using Adapted Gaussian Mixture Models
TL;DR: The major elements of MIT Lincoln Laboratory's Gaussian mixture model (GMM)-based speaker verification system used successfully in several NIST Speaker Recognition Evaluations (SREs) are described.
5K
Front-End Factor Analysis for Speaker Verification
TL;DR: An extension of the previous work which proposes a new speaker representation for speaker verification, a new low-dimensional speaker- and channel-dependent space is defined using a simple factor analysis, named the total variability space because it models both speaker and channel variabilities.
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
TL;DR: A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs.