Open AccessPosted Content
Beyond English-Centric Multilingual Machine Translation
Angela Fan,Shruti Bhosale,Holger Schwenk,Zhiyi Ma,Ahmed El-Kishky,Siddharth Goyal,Mandeep Baines,Onur Celebi,Guillaume Wenzek,Vishrav Chaudhary,Naman Goyal,Tom Birch,Vitaliy Liptchinsky,Sergey Edunov,Edouard Grave,Michael Auli,Armand Joulin +16 more
TL;DR: This work creates a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages and explores how to effectively increase model capacity through a combination of dense scaling and language-specific sparse parameters to create high quality models.
read more
Abstract: Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages. However, much of this work is English-Centric by training only on data which was translated from or to English. While this is supported by large sources of training data, it does not reflect translation needs worldwide. In this work, we create a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages. We build and open source a training dataset that covers thousands of language directions with supervised data, created through large-scale mining. Then, we explore how to effectively increase model capacity through a combination of dense scaling and language-specific sparse parameters to create high quality models. Our focus on non-English-Centric models brings gains of more than 10 BLEU when directly translating between non-English directions while performing competitively to the best single systems of WMT. We open-source our scripts so that others may reproduce the data, evaluation, and final M2M-100 model.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model
Saleh Soltan,Shankar Ananthakrishnan,John G. Fitzgerald,Rahul Gupta,Wael Hamza,Haidar Khan,Charith Peris,Stephen Rawls,Andrew Rosenbaum,Anna Rumshisky,Chandan Prakash,Mukund Sridhar,Fabian Triefenbach,Apurv Verma,Gokhan Tur,Prem Natarajan +15 more
TL;DR: It is demonstrated that multilingual large-scale sequence-to-sequence (seq2seq) models, pre-trained on a mixture of denoising and Causal Language Modeling (CLM) tasks, are more efficient few-shot learners than decoder-only models on various tasks.
Enhancing Translation for Indigenous Languages: Experiments with Multilingual Models
A. Tonja,Hellina Nigatu,Olga Kolesnikova,Grigori Sidorov,Alexander Gelbukh,Jugal Kalita +5 more
- 27 May 2023
TL;DR: This article used two multilingual models, namely M2M-100 and mBART50, and one bilingual (one-to-one) model, and experimented with different transfer learning setups.
Design of Machine Automatic Translation System Based on Artificial Intelligence
Jiao Huang,Shui Liu +1 more
- 02 Nov 2023
TL;DR: The experimental results show that when the semantic sample set size is 10Bit, the accuracy of the translation system is 65%, and it increases with the sample set size, but always higher than other systems, and the performance of the translation system in this article is also superior to other systems.
Enhancing Depression Detection from Narrative Interviews Using Language Models
Palak Sood,Xinming Yang,Ping Wang +2 more
- 05 Dec 2023
TL;DR: A larger dataset, namely I-DAIC, is created for depression detection by integrating three existing datasets in the literature and the effectiveness, advantages, and significant potential of pre-trained language models for depression detection with narrative interviews are demonstrated.
Towards Better Evaluation for Formality-Controlled English-Japanese Machine Translation
Edison Marrese-Taylor,Pin Chen Wang,Yutaka Matsuo +2 more
TL;DR: A Transformer-based classification model for Japanese is proposed, which obtains state-of-the-art results in benchmark datasets and provides empirical evidence suggesting that prompting LLMs is a viable approach to control the formality level of En->Ja MT using LLMs.
References
Deep Residual Learning for Image Recognition
Kaiming He,Xiangyu Zhang,Shaoqing Ren,Jian Sun +3 more
- 27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
•Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
- 01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
138.5K
•Posted Content
Deep Residual Learning for Image Recognition
TL;DR: This work presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously, and provides comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth.
117.9K
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
99K
•Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- 12 Jun 2017
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.