Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Open AccessJournal Article

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Colin Raffel, +8 more

- 01 Jan 2020

- Journal of Machine Learning Research

- Vol. 21, Iss: 140, pp 1-67

2K

TL;DR: This article introduced a unified framework that converts all text-based language problems into a text-to-text format and compared pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks.

Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Posted Content

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

Sanyuan Chen, +16 more

- 26 Oct 2021

- arXiv: Computation and Language

TL;DR: WavLM as mentioned in this paper proposes a pre-trained model to solve full-stack downstream speech tasks and achieves state-of-the-art performance on the SUPERB speech recognition task.

...read moreread less

715

•Posted Content

Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision

Chao Jia, +9 more

- 11 Feb 2021

- arXiv: Computer Vision and Pattern Recog...

TL;DR: In this article, a simple dual-encoder architecture is proposed to align visual and language representations of the image and text pairs using a contrastive loss. But the authors show that the scale of their corpus can make up for its noise and leads to state-of-the-art representations even with a simple learning scheme.

...read moreread less

690

•Journal Article•10.1016/j.dsp.2022.103514

A survey of modern deep learning based object detection models

01 Jun 2022

- Digital signal processing

TL;DR: In this paper , a survey of recent developments in deep learning based object detectors is presented along with some of the prominent backbone architectures used in recognition tasks and compared the performances of these architectures on multiple metrics.

...read moreread less

651

•Posted Content

CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation

Yue Wang, +3 more

- 02 Sep 2021

- arXiv: Computation and Language

TL;DR: CodeT5 as discussed by the authors proposes a unified pre-trained encoder-decoder Transformer model that better leverages the code semantics conveyed from the developer-assigned identifiers, and employs a unified framework to seamlessly support both code understanding and generation tasks and allows for multi-task learning.

...read moreread less

607

•Proceedings Article•10.18653/V1/2021.NAACL-MAIN.45

QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question Answering.

Michihiro Yasunaga, +4 more

- 01 Jun 2021

TL;DR: This work proposes a new model, QA-GNN, which addresses the problem of answering questions using knowledge from pre-trained language models (LMs) and knowledge graphs (KGs) through two key innovations: relevance scoring and joint reasoning.

...read moreread less

602