End-end Speech-to-Text Translation with Modality Agnostic Meta-Learning

doi:10.1109/ICASSP40776.2020.9054759

Proceedings Article10.1109/ICASSP40776.2020.9054759

End-end Speech-to-Text Translation with Modality Agnostic Meta-Learning

Sathish Reddy Indurthi, +6 more

- 04 May 2020

- pp 7904-7908

57

TL;DR: This work adopts a meta-learning algorithm to train a modality agnostic multi-task model that transfers knowledge from source tasks=ASR+MT to target task=ST where the ST task severely lacks data.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

10.48550/arxiv.2010.12829

Multilingual Speech Translation with Efficient Finetuning of Pretrained Models

Xian Li, +8 more

TL;DR: This study presents a simple approach to multilingual speech-to-text translation using efficient finetuning of pretrained models, achieving state-of-the-art results on CoVoST 2 with +6.4 BLEU on average across 15 En-X directions and +5.1 BLEU on 19 X-En directions.

...read moreread less

Proceedings Article•10.48550/arXiv.2204.06028

CUNI-KIT System for Simultaneous Speech Translation Task at IWSLT 2022

Peter Pol'ak, +7 more

- 12 Apr 2022

TL;DR: This paper explores strategies to utilize an offline model in a simultaneous setting without the need to modify the original model and shows that the onlinized offline model outperforms the best IWSLT2021 simultaneous system in medium and high latency regimes and is almost on par in the low latency regime.

...read moreread less

Journal Article•10.48175/ijarsct-15369

A Systematic Survey of Multilingual Speech Transcription and Translation

Vaibhav Ravindra, +2 more

- 07 Feb 2024

- International Journal of Advanced Resear...

TL;DR: The research aims to develop an advanced system capable of seamlessly transcribing speech across diverse linguistic landscapes and underscores the transformative potential of technology in facilitating cross-cultural understanding and enabling meaningful interactions within a multilingual society.

...read moreread less

•Journal Article•10.3390/s22093288

A Model-Agnostic Meta-Baseline Method for Few-Shot Fault Diagnosis of Wind Turbines

Xiaobo Liu, +2 more

- 25 Apr 2022

- Sensors

TL;DR: A model for few-shot fault diagnosis of the wind turbines drivetrain is proposed, named model-agnostic meta-baseline (MAMB), which was analyzed by the small samples of the bearing data from Case Western Reserve University (CWRU) data, the generator bearings, and gearboxes vibration data in wind turbines under randomly changing operating conditions.

...read moreread less

Proceedings Article•10.1109/o-cocosda60357.2023.10482982

Few-shot meta multilabel classifier for low resource accented code-switched speech

Sreeja Manghat, +2 more

- 04 Dec 2023

TL;DR: This work presents a unified classifier chain meta training algorithm using feature reuse property of Almost no Inner Loop (ANIL), and the experimental results on classification accuracy of multilabel classification in few shots setting for Malayalam-English code-switched speech with meta feature reuse was presented.

...read moreread less

...

Expand

References

•Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- 12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

94.2K

Preprint•10.48550/arxiv.1706.03762

Attention Is All You Need

Ashish Vaswani, +7 more

- 01 Jan 2017

Abstract: The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

...read moreread less

51.8K

•Proceedings Article•10.3115/1073083.1073135

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

- 06 Jul 2002

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

...read moreread less

28.9K

•Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

- 01 Jan 2015

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

25.7K

•Posted Content

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

- 01 Sep 2014

- arXiv: Computation and Language

TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

20.9K