Tuplemax Loss for Language Identification
Li Wan,Prashant Sridhar,Yang Yu,Quan Wang,Ignacio Lopez Moreno +4 more
- 24 Apr 2019
- pp 5976-5980
20
TL;DR: The authors proposed a tuplemax loss to model prior knowledge for language identification, which achieved a 2.33% error rate, which is a relative 39.4% improvement over the standard softmax loss method.
read more
Abstract: In many scenarios of a language identification task, the user will specify a small set of languages which he/she can speak instead of a large set of all possible languages. We want to model such prior knowledge into the way we train our neural networks, by replacing the commonly used softmax loss function with a novel loss function named tuplemax loss. As a matter of fact, a typical language identification system launched in North America has about 95% users who could speak no more than two languages. Using the tuplemax loss, our system achieved a 2.33% error rate, which is a relative 39.4% improvement over the 3.85% error rate of standard softmax loss method.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Personal VAD: Speaker-Conditioned Voice Activity Detection
Ignacio Lopez Moreno,Li Wan,Quan Wang,Shaojin Ding,Shuo-Yiin Chang +4 more
- 01 Nov 2020
TL;DR: Personal VAD as discussed by the authors is a system to detect the voice activity of a target speaker at the frame level by training a VAD-alike neural network conditioned on the target speaker embedding or the speaker verification score.
97
Improving Keyword Spotting and Language Identification via Neural Architecture Search at Scale.
Hanna Mazzawi,Xavi Gonzalvo,Aleks Kracun,Prashant Sridhar,Niranjan Subrahmanya,Ignacio Lopez-Moreno,Hyun-Jin Park,Patrick Violette +7 more
- 15 Sep 2019
TL;DR: This paper presents a novel Neural Architecture Search (NAS) framework to improve keyword spotting and spoken language identification models and demonstrates that this approach can automatically design DNNs with an order of magnitude fewer parameters that achieves better performance than the current best models.
55
ADI17: A Fine-Grained Arabic Dialect Identification Dataset
Suwon Shon,Ahmed Ali,Younes Samih,Hamdy Mubarak,James Glass +4 more
- 04 May 2020
TL;DR: This paper collects dialectal Arabic from known YouTube channels from 17 Arabic speaking countries in the Middle East and Northern Africa to create a large-scale Dialect Identification (DID) dataset, and compares state-of-the-art DID techniques on these data, and also analyzes a DID system trained onThese data.
48
Deep learning for spoken language identification
Matias Lindgren
- 19 May 2020
TL;DR: Aalto University, P.O. Box 11000, FI-00076 Aalto www.aalto.fiDeep learning for spoken language identification School School of Science Master’s programme Computer, Communication and Information Sciences Major Computer Science Code SCI3042.
44
End-to-End Language Diarization for Bilingual Code-Switching Speech
Hexin Liu,Leibny Paola García Perera,Xinyi Zhang,Justin Dauwels,Andy W. H. Khong,Sanjeev Khudanpur,Suzy J. Styles +6 more
- 30 Aug 2021
TL;DR: Two end-to-end neural configurations for language diarization on bilingual code-switching speech are proposed, based on an x-vector model followed by a self-attention encoder and an XSA-E2E architecture based on an x-vector model followed by a self-attention encoder.
34
References
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
99K
Generalized End-to-End Loss for Speaker Verification
Li Wan,Quan Wang,Alan Papir,Ignacio Lopez Moreno +3 more
- 15 Apr 2018
TL;DR: This paper proposed a new loss function called generalized end-to-end (GE2E) loss, which makes the training of speaker verification models more efficient than their previous tuple-based end to end loss function.
853
•Posted Content
Generalized End-to-End Loss for Speaker Verification
TL;DR: This paper proposed a new loss function called generalized end-to-end (GE2E) loss, which makes the training of speaker verification models more efficient than their previous tuple-based end to end (TE2E), which does not require an initial stage of example selection.
Speaker Diarization with LSTM
Quan Wang,Carlton Downey,Li Wan,Philip Andrew Mansfield,Ignacio Lopez Moreno +4 more
- 12 Apr 2018
TL;DR: In this paper, the authors combine LSTM-based d-vector audio embeddings with recent work in nonparametric clustering to obtain a state-of-the-art speaker diarization system.
316
Fully Supervised Speaker Diarization
Aonan Zhang,Quan Wang,Zhenyao Zhu,John Paisley,Chong Wang +4 more
- 08 Jan 2019
TL;DR: A fully supervised speaker diarization approach, named unbounded interleaved-state recurrent neural networks (UIS-RNN), given extracted speaker-discriminative embeddings, which decodes in an online fashion while most state-of-the-art systems rely on offline clustering.
309