Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020

- Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

2K

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Posted Content

CoCo: Controllable Counterfactuals for Evaluating Dialogue State Trackers

Shiyang Li, +8 more

- 24 Oct 2020

- arXiv: Computation and Language

TL;DR: This paper proposed controllable counterfactuals (CoCo) to bridge the gap and evaluate dialogue state tracking (DST) models on novel scenarios, i.e., would the system successfully tackle the request if the user responded differently but still consistently with the dialogue flow?

...read moreread less

37

•Posted Content

Pay Attention to MLPs

Hanxiao Liu, +3 more

- 17 May 2021

- arXiv: Learning

TL;DR: The authors proposed a simple network architecture, gMLP, based on MLPs with gating, and showed that it can perform as well as Transformers in key language and vision applications.

...read moreread less

37

•Proceedings Article•10.18653/V1/2021.FINDINGS-ACL.36

GLGE: A New General Language Generation Evaluation Benchmark

Dayiheng Liu, +17 more

- 01 Aug 2021

TL;DR: The General Language Generation Evaluation (GLGE) as discussed by the authors is a multi-task benchmark for evaluating the generalization capabilities of NLG models across eight language generation tasks, with three subtasks in terms of task difficulty: easy, medium, and hard.

...read moreread less

36

•Proceedings Article•10.18653/V1/2021.ACL-LONG.510

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization

Chen Liang, +7 more

- 01 Aug 2021

TL;DR: In this paper, a collection of tickets, referred to as "winning tickets" in extremely over-parametrized models, e.g., pre-trained language models, is studied and the authors observe that at certain compression ratios, the winning tickets can not only match but also exceed that of the full model.

...read moreread less

36

•Posted Content

DialFact: A Benchmark for Fact-Checking in Dialogue.

Prakhar Gupta, +3 more

- 15 Oct 2021

- arXiv: Computation and Language

TL;DR: This article proposed DialFact, a dataset of 22,245 annotated conversational claims paired with pieces of evidence from Wikipedia for fact-checking in dialogue, and found that existing fact checking models trained on non-dialogue data like FEVER fail to perform well on their task, and thus, they propose a simple yet data-efficient solution to effectively improve fact-finding performance in dialogue.

...read moreread less

36