Open AccessPosted Content
Improving sequence-to-sequence speech recognition training with on-the-fly data augmentation
TL;DR: This paper examines the influence of three data augmentation methods on the performance of two S2S model architectures – a time perturbation in the frequency domain and sub-sequence sampling and their own development.
read more
Abstract: Sequence-to-Sequence (S2S) models recently started to show state-of-the-art performance for automatic speech recognition (ASR). With these large and deep models overfitting remains the largest problem, outweighing performance improvements that can be obtained from better architectures. One solution to the overfitting problem is increasing the amount of available training data and the variety exhibited by the training data with the help of data augmentation. In this paper we examine the influence of three data augmentation methods on the performance of two S2S model architectures. One of the data augmentation method comes from literature, while two other methods are our own development - a time perturbation in the frequency domain and sub-sequence sampling. Our experiments on Switchboard and Fisher data show state-of-the-art performance for S2S models that are trained solely on the speech training data and do not use additional text data.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
An empirical survey of data augmentation for time series classification with neural networks.
Brian Kenji Iwana,Seiichi Uchida +1 more
TL;DR: A taxonomy is proposed and outline the four families in time series data augmentation, including transformation-based methods, pattern mixing, generative models, and decomposition methods, and their application to time series classification with neural networks.
Findings of the iwslt 2020 evaluation campaign
Ebrahim Ansari,Amittai Axelrod,Nguyen Bach,Ondrej Bojar,Roldano Cattoni,Fahim Dalvi,Nadir Durrani,Marcello Federico,Christian Federmann,Jiatao Gu,Fei Huang,Kevin Knight,Xutai Ma,Ajay Nagesh,Matteo Negri,Jan Niehues,Juan Pino,Elizabeth Salesky,Xing Shi,Sebastian Stüker,Marco Turchi,Alex Waibel,Changhan Wang +22 more
- 01 Jul 2020
TL;DR: Each track’s goal, data and evaluation metrics are introduced, and the results of the received submissions are reported.
A New Training Pipeline for an Improved Neural Transducer
Albert Zeyer,André Merboldt,Ralf Schlüter,Hermann Ney +3 more
- 19 May 2020
TL;DR: It is found that the transducer model generalizes much better on longer sequences than the attention model, and outperforms the authors' attention model on Switchboard 300h by over 6% relative WER.
70
End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020
Marco Gaido,Mattia Antonino Di Gangi,Matteo Negri,Marco Turchi +3 more
- 04 Jun 2020
TL;DR: In the IWSLT 2020 offline speech translation task, FBK as mentioned in this paper used an end-to-end model based on an adaptation of the Transformer for speech data to translate English TED talks audio into German texts.
Weakly Supervised Histopathology Image Segmentation With Sparse Point Annotations
Zhe Chen,Zhao Chen,Jingxin Liu,Qiang Zheng,Yuang Zhu,Yanfei Zuo,Zhaoyu Wang,Xiaosong Guan,Yue Wang,Yuan Li +9 more
TL;DR: A novel end-to-end weakly supervised learning framework named WESUP, trained by very sparse point annotations, that performs accurate segmentation and exhibits good generalizability in histopathology images and can even beat an advanced fully supervised segmentation network.
47
References
•Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
- 01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
138.5K
•Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- 12 Jun 2017
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
•Journal Article
Dropout: a simple way to prevent neural networks from overfitting
TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Daniel S. Park,William Chan,Yu Zhang,Chung-Cheng Chiu,Barret Zoph,Ekin D. Cubuk,Quoc V. Le +6 more
- 18 Apr 2019
TL;DR: This work presents SpecAugment, a simple data augmentation method for speech recognition that is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients) and achieves state-of-the-art performance on the LibriSpeech 960h and Swichboard 300h tasks, outperforming all prior work.
4.5K
Listen, attend and spell: A neural network for large vocabulary conversational speech recognition
William Chan,Navdeep Jaitly,Quoc V. Le,Oriol Vinyals +3 more
- 20 Mar 2016
TL;DR: Listen, Attend and Spell (LAS), a neural speech recognizer that transcribes speech utterances directly to characters without pronunciation models, HMMs or other components of traditional speech recognizers is presented.
3K
Related Papers (5)
Teresa M. Kamm,Gerard G. L. Meyer +1 more
- 01 Jan 2002
Gerald G. Meyer,Teresa M. Kamm +1 more
- 01 Jan 2004