Improving and Simplifying Pattern Exploiting Training

Open AccessPosted Content

Improving and Simplifying Pattern Exploiting Training

- 22 Mar 2021

42

TL;DR: ADAPET as discussed by the authors modifies PET's objective to provide denser supervision during fine-tuning, which outperforms PET on SuperGLUE without any task-specific unlabeled data.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

•Posted Content

Cross-Task Generalization via Natural Language Crowdsourcing Instructions

Swaroop Mishra, +3 more

- 18 Apr 2021

- arXiv: Computation and Language

TL;DR: NATURAL INSTRUCTIONS as mentioned in this paper is a dataset of 61 distinct tasks, their human-authored instructions and 193k task instances, obtained from crowdsourcing instructions used to collect existing NLP datasets and mapped to a unified schema.

...read moreread less

185

•Proceedings Article•10.18653/v1/2022.naacl-main.167

Do Prompt-Based Models Really Understand the Meaning of Their Prompts?

01 Jan 2022

TL;DR: This article found that model performance is more dependent on the choice of the LM target words (a.k.a. the verbalizer that converts LM vocabulary prediction to class labels) than on the text of the prompt itself.

...read moreread less

126

•Posted Content

CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP

Qinyuan Ye, +2 more

- 18 Apr 2021

- arXiv: Computation and Language

TL;DR: CrossFit as mentioned in this paper is a task setup for studying cross-task few-shot learning ability, which standardizes seen/unseen task splits, data access during different learning stages, and the evaluation protocols.

...read moreread less

92

•Posted Content

Label Verbalization and Entailment for Effective Zero- and Few-Shot Relation Extraction

Oscar Sainz, +4 more

- 08 Sep 2021

- arXiv: Computation and Language

TL;DR: This article reformulated relation extraction as an entailment task, with simple, hand-made, verbalizations of relations produced in less than 15 min per relation, achieving state-of-the-art performance on TACRED.

...read moreread less

70

•Posted Content

Differentiable Prompt Makes Pre-trained Language Models Better Few-shot Learners

Ningyu Zhang, +7 more

- 30 Aug 2021

- arXiv: Computation and Language

TL;DR: DifferentiAble pRompT (DART) as discussed by the authors is a pluggable, extensible, and efficient approach which can convert small language models into better few-shot learners without any prompt engineering.

...read moreread less

63

...

Expand

References

Proceedings Article•10.18653/V1/N19-1423

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018

TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

24.6K

•Proceedings Article

PyTorch: An Imperative Style, High-Performance Deep Learning Library

Adam Paszke, +20 more

- 01 Jan 2019

TL;DR: This paper details the principles that drove the implementation of PyTorch and how they are reflected in its architecture, and explains how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance.

...read moreread less

10.3K

•Proceedings Article

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Karen Simonyan, +2 more

- 23 Dec 2013

TL;DR: In this paper, the gradient of the class score with respect to the input image is computed to compute a class saliency map, which can be used for weakly supervised object segmentation using classification ConvNets.

...read moreread less

7.5K

•Proceedings Article

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Zhenzhong Lan, +5 more

- 30 Apr 2020

TL;DR: This work presents two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT, and uses a self-supervised loss that focuses on modeling inter-sentence coherence.

...read moreread less

4.3K

•Proceedings Article•10.18653/V1/2020.ACL-MAIN.740

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks

Suchin Gururangan, +8 more

- 23 Apr 2020

TL;DR: It is consistently found that multi-phase adaptive pretraining offers large gains in task performance, and it is shown that adapting to a task corpus augmented using simple data selection strategies is an effective alternative, especially when resources for domain-adaptive pretraining might be unavailable.

...read moreread less

2.7K