Open AccessPosted Content
Improving and Simplifying Pattern Exploiting Training
TL;DR: ADAPET as discussed by the authors modifies PET's objective to provide denser supervision during fine-tuning, which outperforms PET on SuperGLUE without any task-specific unlabeled data.
read more
Abstract: Recently, pre-trained language models (LMs) have achieved strong performance
when fine-tuned on difficult benchmarks like SuperGLUE. However, performance
can suffer when there are very few labeled examples available for fine-tuning.
Pattern Exploiting Training (PET) is a recent approach that leverages patterns
for few-shot learning. However, PET uses task-specific unlabeled data. In this
paper, we focus on few-shot learning without any unlabeled data and introduce
ADAPET, which modifies PET's objective to provide denser supervision during
fine-tuning. As a result, ADAPET outperforms PET on SuperGLUE without any
task-specific unlabeled data. Our code can be found at
https://github.com/rrmenon10/ADAPET.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
•Posted Content
Cross-Task Generalization via Natural Language Crowdsourcing Instructions
TL;DR: NATURAL INSTRUCTIONS as mentioned in this paper is a dataset of 61 distinct tasks, their human-authored instructions and 193k task instances, obtained from crowdsourcing instructions used to collect existing NLP datasets and mapped to a unified schema.
185
Do Prompt-Based Models Really Understand the Meaning of Their Prompts?
01 Jan 2022
TL;DR: This article found that model performance is more dependent on the choice of the LM target words (a.k.a. the verbalizer that converts LM vocabulary prediction to class labels) than on the text of the prompt itself.
126
•Posted Content
CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP
TL;DR: CrossFit as mentioned in this paper is a task setup for studying cross-task few-shot learning ability, which standardizes seen/unseen task splits, data access during different learning stages, and the evaluation protocols.
92
•Posted Content
Label Verbalization and Entailment for Effective Zero- and Few-Shot Relation Extraction
TL;DR: This article reformulated relation extraction as an entailment task, with simple, hand-made, verbalizations of relations produced in less than 15 min per relation, achieving state-of-the-art performance on TACRED.
70
•Posted Content
Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners
TL;DR: DifferentiAble pRompT (DART) as discussed by the authors is a pluggable, extensible, and efficient approach which can convert small language models into better few-shot learners without any prompt engineering.
63
References
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin,Ming-Wei Chang,Kenton Lee,Kristina Toutanova +3 more
- 11 Oct 2018
TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
24.6K
•Proceedings Article
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke,Sam Gross,Francisco Massa,Adam Lerer,James Bradbury,Gregory Chanan,Trevor Killeen,Zeming Lin,Natalia Gimelshein,Luca Antiga,Alban Desmaison,Andreas Kopf,Edward Z. Yang,Zachary DeVito,Martin Raison,Alykhan Tejani,Sasank Chilamkurthy,Benoit Steiner,Lu Fang,Junjie Bai,Soumith Chintala +20 more
- 01 Jan 2019
TL;DR: This paper details the principles that drove the implementation of PyTorch and how they are reflected in its architecture, and explains how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.
•Proceedings Article
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
Karen Simonyan,Andrea Vedaldi,Andrew Zisserman +2 more
- 23 Dec 2013
TL;DR: In this paper, the gradient of the class score with respect to the input image is computed to compute a class saliency map, which can be used for weakly supervised object segmentation using classification ConvNets.
•Proceedings Article
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan,Mingda Chen,Sebastian Goodman,Kevin Gimpel,Piyush Sharma,Radu Soricut +5 more
- 30 Apr 2020
TL;DR: This work presents two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT, and uses a self-supervised loss that focuses on modeling inter-sentence coherence.
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks
Suchin Gururangan,Ana Marasović,Ana Marasović,Swabha Swayamdipta,Kyle Lo,Iz Beltagy,Doug Downey,Noah A. Smith,Noah A. Smith +8 more
- 23 Apr 2020
TL;DR: It is consistently found that multi-phase adaptive pretraining offers large gains in task performance, and it is shown that adapting to a task corpus augmented using simple data selection strategies is an effective alternative, especially when resources for domain-adaptive pretraining might be unavailable.