Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020

- Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

2K

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article

Dense Hierarchical Retrieval for Open-Domain Question Answering

Ye Liu, +5 more

- 28 Oct 2021

TL;DR: This article proposed Dense Hierarchical Retrieval (DHR), a hierarchical framework which can generate accurate dense representations of passages by utilizing both macroscopic semantics in the document and microscopic semantics specific to each passage.

...read moreread less

14

•Proceedings Article•10.18653/V1/2020.ACL-MAIN.330

MATINF: A Jointly Labeled Large-Scale Dataset for Classification, Question Answering and Summarization

Canwen Xu, +4 more

- 01 Apr 2020

TL;DR: This article proposed MATINF, the first large-scale dataset for cross-task learning in NLP, which contains 1.07 million question-answer pairs with human-labeled categories and user-generated question descriptions.

...read moreread less

13

•Posted Content

Multimodal Few-Shot Learning with Frozen Language Models

Maria Tsimpoukelli, +5 more

- 25 Jun 2021

- arXiv: Computer Vision and Pattern Recog...

TL;DR: The authors used aligned image and caption data to train a vision encoder to represent each image as a sequence of continuous embeddings, such that a pre-trained, frozen language model prompted with this prefix generates the appropriate caption.

...read moreread less

13

•Proceedings Article•10.1145/3459637.3482009

Matches Made in Heaven: Toolkit and Large-Scale Datasets for Supervised Query Reformulation

Negar Arabzadeh, +4 more

- 26 Oct 2021

TL;DR: In this paper, the authors present three large-scale query reformulation datasets, namely, the Diamond, Platinum and Gold datasets, based on the queries in the MS MARCO dataset, where the original source query is matched with an alternative query that has a perfect retrieval effectiveness.

...read moreread less

13

•Proceedings Article•10.18653/V1/2021.NLP4PROG-1.9

Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data

Moshe Hazoom, +2 more

- 01 Aug 2021

TL;DR: SEDE as discussed by the authors is a dataset with 12,023 pairs of utterances and SQL queries collected from real usage on the Stack Exchange website, which contains a variety of real-world challenges which were rarely reflected so far in any other semantic parsing dataset.

...read moreread less

13