Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020

- Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

2K

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Proceedings Article•10.18653/V1/2021.ACL-LONG.568

Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning

Armen Aghajanyan, +2 more

- 01 Aug 2021

TL;DR: The authors empirically show that common pre-trained models have a very low intrinsic dimension; in other words, there exists a low dimension reparameterization that is as effective for fine-tuning as the full parameter space.

...read moreread less

144

•Proceedings Article•10.18653/V1/2020.EMNLP-MAIN.272

Knowledge-Grounded Dialogue Generation with Pre-trained Language Models

Xueliang Zhao, +5 more

- 01 Nov 2020

TL;DR: The authors proposed a knowledge-grounded dialogue generation model with pre-trained language models and an unsupervised approach to jointly optimize knowledge selection and response generation with unlabeled dialogues.

...read moreread less

144

•Proceedings Article

Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus

Jesse Dodge, +7 more

- 01 Nov 2021

TL;DR: The Colossal Clean Crawled Corpus (C4) as discussed by the authors is a dataset created by applying a set of filters to a single snapshot of Common Crawl, and it is used to evaluate the impact of the filters applied to create this dataset.

...read moreread less

143

•Proceedings Article•10.18653/V1/2021.NAACL-MAIN.339

NeuroLogic Decoding: (Un)supervised Neural Text Generation with Predicate Logic Constraints

Ximing Lu, +5 more

- 01 Jun 2021

TL;DR: This work proposes NeuroLogic Decoding, a simple yet effective algorithm that enables neural language models – supervised or not – to generate fluent text while satisfying complex lexical constraints, and suggests the limit of large-scale neural networks for fine-grained controllable generation and the promise of inference-time algorithms.

...read moreread less

142

•Posted Content

Deduplicating Training Data Makes Language Models Better

Katherine Lee, +6 more

- 14 Jul 2021

- arXiv: Computation and Language

TL;DR: The authors find that over 1% of the unprompted output of language models trained on these datasets is copied verbatim from the training data, and they develop two tools that allow them to deduplicate training datasets, for example removing from C4 a single 61 word English sentence repeated over 60,000 times.

...read moreread less

142