Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation

Open AccessPosted Content

Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation

- 29 Oct 2019

30

TL;DR: This paper examines the influence of three data augmentation methods on the performance of two S2S model architectures – a time perturbation in the frequency domain and sub-sequence sampling and their own development.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Journal Article•10.1371/JOURNAL.PONE.0254841

An empirical survey of data augmentation for time series classification with neural networks.

Brian Kenji Iwana, +1 more

- 15 Jul 2021

- PLOS ONE

TL;DR: A taxonomy is proposed and outline the four families in time series data augmentation, including transformation-based methods, pattern mixing, generative models, and decomposition methods, and their application to time series classification with neural networks.

...read moreread less

359

•Proceedings Article•10.21437/INTERSPEECH.2020-1855

A New Training Pipeline for an Improved Neural Transducer

Albert Zeyer, +3 more

- 19 May 2020

TL;DR: It is found that the transducer model generalizes much better on longer sequences than the attention model, and outperforms the authors' attention model on Switchboard 300h by over 6% relative WER.

...read moreread less

70

•Proceedings Article•10.18653/V1/2020.IWSLT-1.8

End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020

Marco Gaido, +3 more

- 04 Jun 2020

TL;DR: In the IWSLT 2020 offline speech translation task, FBK as mentioned in this paper used an end-to-end model based on an adaptation of the Transformer for speech data to translate English TED talks audio into German texts.

...read moreread less

61

Journal Article•10.1109/JBHI.2020.3024262

Weakly Supervised Histopathology Image Segmentation With Sparse Point Annotations

Zhe Chen, +9 more

- 01 May 2021

- IEEE Journal of Biomedical and Health In...

TL;DR: A novel end-to-end weakly supervised learning framework named WESUP, trained by very sparse point annotations, that performs accurate segmentation and exhibits good generalizability in histopathology images and can even beat an advanced fully supervised segmentation network.

...read moreread less

47

...

Expand

References

•Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

- 01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

138.5K

•Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- 12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

94.2K

•Journal Article

Dropout: a simple way to prevent neural networks from overfitting

Nitish Srivastava, +4 more

- 01 Jan 2014

- Journal of Machine Learning Research

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less

43.7K

•Proceedings Article•10.21437/INTERSPEECH.2019-2680

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Daniel S. Park, +6 more

- 18 Apr 2019

TL;DR: This work presents SpecAugment, a simple data augmentation method for speech recognition that is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients) and achieves state-of-the-art performance on the LibriSpeech 960h and Swichboard 300h tasks, outperforming all prior work.

...read moreread less

4.5K

Proceedings Article•10.1109/ICASSP.2016.7472621

Listen, attend and spell: A neural network for large vocabulary conversational speech recognition

William Chan, +3 more

- 20 Mar 2016

TL;DR: Listen, Attend and Spell (LAS), a neural speech recognizer that transcribes speech utterances directly to characters without pronunciation models, HMMs or other components of traditional speech recognizers is presented.

...read moreread less

3K