Open AccessJournal Article
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel,Noam Shazeer,Adam Roberts,Katherine Lee,Sharan Narang,Michael Matena,Yanqi Zhou,Wei Li,Peter J. Liu +8 more
TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.
read more
Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning
Armen Aghajanyan,Sonal Gupta,Luke Zettlemoyer +2 more
- 01 Aug 2021
TL;DR: The authors empirically show that common pre-trained models have a very low intrinsic dimension; in other words, there exists a low dimension reparameterization that is as effective for fine-tuning as the full parameter space.
Knowledge-Grounded Dialogue Generation with Pre-trained Language Models
Xueliang Zhao,Wei Wu,Can Xu,Chongyang Tao,Dongyan Zhao,Rui Yan +5 more
- 01 Nov 2020
TL;DR: The authors proposed a knowledge-grounded dialogue generation model with pre-trained language models and an unsupervised approach to jointly optimize knowledge selection and response generation with unlabeled dialogues.
•Proceedings Article
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus
Jesse Dodge,Maarten Sap,Ana Marasović,William Agnew,Gabriel Ilharco,Dirk Groeneveld,Margaret Mitchell,Matt Gardner +7 more
- 01 Nov 2021
TL;DR: The Colossal Clean Crawled Corpus (C4) as discussed by the authors is a dataset created by applying a set of filters to a single snapshot of Common Crawl, and it is used to evaluate the impact of the filters applied to create this dataset.
143
NeuroLogic Decoding: (Un)supervised Neural Text Generation with Predicate Logic Constraints
Ximing Lu,Peter West,Rowan Zellers,Ronan Le Bras,Chandra Bhagavatula,Yejin Choi +5 more
- 01 Jun 2021
TL;DR: This work proposes NeuroLogic Decoding, a simple yet effective algorithm that enables neural language models – supervised or not – to generate fluent text while satisfying complex lexical constraints, and suggests the limit of large-scale neural networks for fine-grained controllable generation and the promise of inference-time algorithms.
•Posted Content
Deduplicating Training Data Makes Language Models Better
Katherine Lee,Daphne Ippolito,Andrew Nystrom,Chiyuan Zhang,Douglas Eck,Chris Callison-Burch,Nicholas Carlini +6 more
TL;DR: The authors find that over 1% of the unprompted output of language models trained on these datasets is copied verbatim from the training data, and they develop two tools that allow them to deduplicate training datasets, for example removing from C4 a single 61 word English sentence repeated over 60,000 times.
142