Proceedings Article10.1109/ICASSP40776.2020.9054759
End-end Speech-to-Text Translation with Modality Agnostic Meta-Learning
Sathish Reddy Indurthi,Houjeung Han,Nikhil Kumar Lakumarapu,Beom-Seok Lee,Insoo Chung,Sangha Kim,Chanwoo Kim +6 more
- 04 May 2020
- pp 7904-7908
57
TL;DR: This work adopts a meta-learning algorithm to train a modality agnostic multi-task model that transfers knowledge from source tasks=ASR+MT to target task=ST where the ST task severely lacks data.
read more
Abstract: Collecting large amounts of data to train end-to-end Speech Translation (ST) models is more difficult compared to the ASR and MT tasks. Previous studies have proposed the use of transfer learning approaches to overcome the above difficulty. These approaches benefit from weakly supervised training data, such as ASR speech-to-transcript or MT text-to-text translation pairs. However, the parameters in these models are updated independently of each task, which may lead to sub-optimal solutions. In this work, we adopt a meta-learning algorithm to train a modality agnostic multi-task model that transfers knowledge from source tasks=ASR+MT to target task=ST where the ST task severely lacks data. In the meta-learning phase, parameters are updated in such a way that they act as a good ini-tialization point for the target ST task. We evaluate the proposed meta-learning approach for ST tasks on English-German (En-De) and English-French (En-Fr) language pairs from the Multilingual Speech Translation Corpus (MuST-C). Our method outperforms the previous transfer learning approaches and sets new state-of-the-art results for En-De and En-Fr ST tasks by obtaining 9.18, and 11.76 BLEU point improvements, respectively.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Multilingual Speech Translation with Efficient Finetuning of Pretrained Models
Xian Li,Changhan Wang,Yun Tang,Chau Tran,Yuqing Tang,Juan Pino,Alexei Baevski,Alexis Conneau,Michael Auli +8 more
TL;DR: This study presents a simple approach to multilingual speech-to-text translation using efficient finetuning of pretrained models, achieving state-of-the-art results on CoVoST 2 with +6.4 BLEU on average across 15 En-X directions and +5.1 BLEU on 19 X-En directions.
CUNI-KIT System for Simultaneous Speech Translation Task at IWSLT 2022
Peter Pol'ak,Ngoc-Quan Ngoc,Tuan-Nam Nguyen,Danni Liu,Carlos Mullov,Jan Niehues,Ondrej Bojar,Alex Waibel +7 more
- 12 Apr 2022
TL;DR: This paper explores strategies to utilize an offline model in a simultaneous setting without the need to modify the original model and shows that the onlinized offline model outperforms the best IWSLT2021 simultaneous system in medium and high latency regimes and is almost on par in the low latency regime.
A Systematic Survey of Multilingual Speech Transcription and Translation
Vaibhav Ravindra,Dheeraj K N,Prof. Vinay Raj +2 more
TL;DR: The research aims to develop an advanced system capable of seamlessly transcribing speech across diverse linguistic landscapes and underscores the transformative potential of technology in facilitating cross-cultural understanding and enabling meaningful interactions within a multilingual society.
A Model-Agnostic Meta-Baseline Method for Few-Shot Fault Diagnosis of Wind Turbines
Xiaobo Liu,Wei Teng,Yibing Liu +2 more
TL;DR: A model for few-shot fault diagnosis of the wind turbines drivetrain is proposed, named model-agnostic meta-baseline (MAMB), which was analyzed by the small samples of the bearing data from Case Western Reserve University (CWRU) data, the generator bearings, and gearboxes vibration data in wind turbines under randomly changing operating conditions.
Few-shot meta multilabel classifier for low resource accented code-switched speech
Sreeja Manghat,Sreeram Manghat,Tanja Schultz +2 more
- 04 Dec 2023
TL;DR: This work presents a unified classifier chain meta training algorithm using feature reuse property of Almost no Inner Loop (ANIL), and the experimental results on classification accuracy of multilabel classification in few shots setting for Malayalam-English code-switched speech with meta feature reuse was presented.
References
•Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- 12 Jun 2017
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Attention Is All You Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Łukasz Kaiser,Illia Polosukhin +7 more
- 01 Jan 2017
Abstract: The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
51.8K
Bleu: a Method for Automatic Evaluation of Machine Translation
Kishore Papineni,Salim Roukos,Todd Ward,Wei-Jing Zhu +3 more
- 06 Jul 2002
TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
•Proceedings Article
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau,Kyunghyun Cho,Yoshua Bengio +2 more
- 01 Jan 2015
TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
25.7K
•Posted Content
Neural Machine Translation by Jointly Learning to Align and Translate
TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.
20.9K