Multi-Task Self-Supervised Learning for Robust Speech Recognition

doi:10.1109/ICASSP40776.2020.9053569

Open AccessProceedings Article10.1109/ICASSP40776.2020.9053569

Multi-Task Self-Supervised Learning for Robust Speech Recognition

Mirco Ravanelli, +6 more

- 25 Jan 2020

- pp 6989-6993

361

TL;DR: PASE+ is proposed, an improved version of PASE that better learns short- and long-term speech dynamics with an efficient combination of recurrent and convolutional networks and learns transferable representations suitable for highly mismatched acoustic conditions.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

Alexei Baevski, +3 more

- 20 Jun 2020

TL;DR: It is shown for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.

...read moreread less

4.1K

•Posted Content

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

Sanyuan Chen, +16 more

- 26 Oct 2021

- arXiv: Computation and Language

TL;DR: WavLM as mentioned in this paper proposes a pre-trained model to solve full-stack downstream speech tasks and achieves state-of-the-art performance on the SUPERB speech recognition task.

...read moreread less

715

•Proceedings Article•10.21437/INTERSPEECH.2021-1775

SUPERB: Speech processing Universal PERformance Benchmark

Shu-wen Yang, +19 more

- 03 May 2021

TL;DR: The Speech processing Universal PERformance Benchmark (SUPERB) as discussed by the authors is a leaderboard to benchmark the performance of a shared model across a wide range of speech processing tasks with minimal architecture changes and labeled data.

...read moreread less

459

•Proceedings Article•10.21437/interspeech.2022-143

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

18 Sep 2022

TL;DR: XLS-R as discussed by the authors is a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0, which is trained with up to 2B parameters on nearly half a million hours of publicly available speech audio in 128 languages.

...read moreread less

269

Journal Article•10.1109/JSTSP.2022.3207050

Self-Supervised Speech Representation Learning: A Review

Abdelrahman Mohamed, +11 more

- 21 May 2022

- IEEE Journal of Selected Topics in Signa...

TL;DR: This review presents approaches for self-supervised speech representation learning and their connection to other research areas, and reviews recent efforts on benchmarking learned representations to extend the application beyond speech recognition.

...read moreread less

235

...

Expand

References

•Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

138.5K

•Posted Content

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 22 Dec 2014

- arXiv: Learning

TL;DR: In this article, the adaptive estimates of lower-order moments are used for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimate of lowerorder moments.

...read moreread less

82.5K

•Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018

- arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

81.7K

•Journal Article•10.3156/JSOFT.29.5_177_2

Generative Adversarial Nets

Ian Goodfellow, +7 more

- 08 Dec 2014

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

48.6K

•Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, +1 more

- 06 Jul 2015

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

...read moreread less

43.7K