Tuplemax Loss for Language Identification

doi:10.1109/ICASSP.2019.8683313

Open AccessProceedings Article10.1109/ICASSP.2019.8683313

Tuplemax Loss for Language Identification

Li Wan, +4 more

- 24 Apr 2019

- pp 5976-5980

20

TL;DR: The authors proposed a tuplemax loss to model prior knowledge for language identification, which achieved a 2.33% error rate, which is a relative 39.4% improvement over the standard softmax loss method.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.21437/ODYSSEY.2020-62

Personal VAD: Speaker-Conditioned Voice Activity Detection

Ignacio Lopez Moreno, +4 more

- 01 Nov 2020

TL;DR: Personal VAD as discussed by the authors is a system to detect the voice activity of a target speaker at the frame level by training a VAD-alike neural network conditioned on the target speaker embedding or the speaker verification score.

...read moreread less

97

Proceedings Article•10.21437/INTERSPEECH.2019-1916

Improving Keyword Spotting and Language Identification via Neural Architecture Search at Scale.

Hanna Mazzawi, +7 more

- 15 Sep 2019

TL;DR: This paper presents a novel Neural Architecture Search (NAS) framework to improve keyword spotting and spoken language identification models and demonstrates that this approach can automatically design DNNs with an order of magnitude fewer parameters that achieves better performance than the current best models.

...read moreread less

55

Proceedings Article•10.1109/ICASSP40776.2020.9052982

ADI17: A Fine-Grained Arabic Dialect Identification Dataset

Suwon Shon, +4 more

- 04 May 2020

TL;DR: This paper collects dialectal Arabic from known YouTube channels from 17 Arabic speaking countries in the Middle East and Northern Africa to create a large-scale Dialect Identification (DID) dataset, and compares state-of-the-art DID techniques on these data, and also analyzes a DID system trained onThese data.

...read moreread less

48

Deep learning for spoken language identification

Matias Lindgren

- 19 May 2020

TL;DR: Aalto University, P.O. Box 11000, FI-00076 Aalto www.aalto.fiDeep learning for spoken language identification School School of Science Master’s programme Computer, Communication and Information Sciences Major Computer Science Code SCI3042.

...read moreread less

44

Proceedings Article•10.21437/INTERSPEECH.2021-82

End-to-End Language Diarization for Bilingual Code-Switching Speech

Hexin Liu, +6 more

- 30 Aug 2021

TL;DR: Two end-to-end neural conﬁgurations for language diarization on bilingual code-switching speech are proposed, based on an x-vector model followed by a self-attention encoder and an XSA-E2E architecture based on an x-vector model followed by a self-attention encoder.

...read moreread less

34

...

Expand

References

Journal Article•10.1162/NECO.1997.9.8.1735

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997

- Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

99K

•Proceedings Article•10.1109/ICASSP.2018.8462665

Generalized End-to-End Loss for Speaker Verification

Li Wan, +3 more

- 15 Apr 2018

TL;DR: This paper proposed a new loss function called generalized end-to-end (GE2E) loss, which makes the training of speaker verification models more efficient than their previous tuple-based end to end loss function.

...read moreread less

853

•Posted Content

Generalized End-to-End Loss for Speaker Verification

Li Wan, +3 more

- 28 Oct 2017

- arXiv: Audio and Speech Processing

TL;DR: This paper proposed a new loss function called generalized end-to-end (GE2E) loss, which makes the training of speaker verification models more efficient than their previous tuple-based end to end (TE2E), which does not require an initial stage of example selection.

...read moreread less

568

•Proceedings Article•10.1109/ICASSP.2018.8462628

Speaker Diarization with LSTM

Quan Wang, +4 more

- 12 Apr 2018

TL;DR: In this paper, the authors combine LSTM-based d-vector audio embeddings with recent work in nonparametric clustering to obtain a state-of-the-art speaker diarization system.

...read moreread less

316

•Proceedings Article•10.1109/ICASSP.2019.8683892

Fully Supervised Speaker Diarization

Aonan Zhang, +4 more

- 08 Jan 2019

TL;DR: A fully supervised speaker diarization approach, named unbounded interleaved-state recurrent neural networks (UIS-RNN), given extracted speaker-discriminative embeddings, which decodes in an online fashion while most state-of-the-art systems rely on offline clustering.

...read moreread less

309

...

Expand

Tuplemax Loss for Language Identification

Chat with Paper

AI Agents for this Paper

Citations

Personal VAD: Speaker-Conditioned Voice Activity Detection

Improving Keyword Spotting and Language Identification via Neural Architecture Search at Scale.

ADI17: A Fine-Grained Arabic Dialect Identification Dataset

Deep learning for spoken language identification

End-to-End Language Diarization for Bilingual Code-Switching Speech

References

Long short-term memory

Generalized End-to-End Loss for Speaker Verification

Generalized End-to-End Loss for Speaker Verification

Speaker Diarization with LSTM

Fully Supervised Speaker Diarization

Related Papers (5)

Tuplemax Loss for Language Identification

The more you know, the less you learn: from knowledge transfer to one-shot learning of object categories

Cynical Selection of Language Model Training Data.

A Syntactic Word Prediction System for Disabled People

Did You Say What I Think You Said? - Towards a Language-Based Measurement of a Speech Recognizer's Confidence