Fine-tuning

Topic Tools

Papers published on a yearly basis

Papers

Proceedings Article•10.18653/v1/2022.acl-short.8•

P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks

[...]

Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Lam Tam, Zhengxiao Du, Zhilin Yang, Jie Tang - Show less +3 more

1 Jan 2022

TL;DR: The method P-Tuning v2 is an implementation of Deep Prompt Tuning (CITATION) optimized and adapted for NLU and can serve as an alternative to finetuning and a strong baseline for future research.

...read moreread less

Abstract: Prompt tuning, which only tunes continuous prompts with a frozen language model, substantially reduces per-task storage and memory usage at training. However, in the context of NLU, prior work reveals that prompt tuning does not perform well for normal-sized pretrained models. We also find that existing methods of prompt tuning cannot handle hard sequence labeling tasks, indicating a lack of universality. We present a novel empirical finding that properly optimized prompt tuning can be universally effective across a wide range of model scales and NLU tasks. It matches the performance of finetuning while having only 0.1%-3% tuned parameters. Our method P-Tuning v2 is an implementation of Deep Prompt Tuning (CITATION) optimized and adapted for NLU. Given the universality and simplicity of P-Tuning v2, we believe it can serve as an alternative to finetuning and a strong baseline for future research.

...read moreread less

739 citations

Book Chapter•10.1007/978-3-031-19827-4_41•

Visual Prompt Tuning

[...]

Menglin Jia¹•Institutions (1)

Cornell University¹

1 Jan 2022

TL;DR: In this article , Visual Prompt Tuning (VPT) is proposed as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision.

...read moreread less

Abstract: The current modus operandi in adapting pre-trained models involves updating all the backbone parameters, i.e., full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) as an efficient and effective alternative to full fine-tuning for large-scale Transformer models in vision. Taking inspiration from recent advances in efficiently tuning large language models, VPT introduces only a small amount (less than 1% of model parameters) of trainable parameters in the input space while keeping the model backbone frozen. Via extensive experiments on a wide variety of downstream recognition tasks, we show that VPT achieves significant performance gains compared to other parameter efficient tuning protocols. Most importantly, VPT even outperforms full fine-tuning in many cases across model capacities and training data scales, while reducing per-task storage cost. Code is available at github.com/kmnp/vpt .

...read moreread less

514 citations

Journal Article•10.1038/s42256-023-00626-4•

Parameter-efficient fine-tuning of large-scale pre-trained language models

[...]

Maosong Sun¹, Zhiyuan Liu¹, Denis Couvet²•Institutions (2)

Tsinghua University¹, University Town of Shenzhen²

02 Mar 2023-Nature Machine Intelligence

TL;DR: The delta-tuning approach as discussed by the authors optimizes a small portion of the model parameters while keeping the rest fixed, drastically cutting down computation and storage costs, and demonstrates that large-scale models could be effectively stimulated by the optimization of a few parameters.

...read moreread less

Abstract: Abstract With the prevalence of pre-trained language models (PLMs) and the pre-training–fine-tuning paradigm, it has been continuously shown that larger models tend to yield better performance. However, as PLMs scale up, fine-tuning and storing all the parameters is prohibitively costly and eventually becomes practically infeasible. This necessitates a new branch of research focusing on the parameter-efficient adaptation of PLMs, which optimizes a small portion of the model parameters while keeping the rest fixed, drastically cutting down computation and storage costs. In general, it demonstrates that large-scale models could be effectively stimulated by the optimization of a few parameters. Despite the various designs, here we discuss and analyse the approaches under a more consistent and accessible term ‘delta-tuning’, where ‘delta’ a mathematical notation often used to denote changes, is borrowed to refer to the portion of parameters that are ‘changed’ during training. We formally describe the problem and propose a unified categorization criterion for existing delta-tuning methods to explore their correlations and differences. We also discuss the theoretical principles underlying the effectiveness of delta-tuning and interpret them from the perspectives of optimization and optimal control. Furthermore, we provide a holistic empirical study on over 100 natural language processing tasks and investigate various aspects of delta-tuning. With comprehensive study and analysis, our research demonstrates the theoretical and practical properties of delta-tuning in the adaptation of PLMs.

...read moreread less

250 citations

Posted Content•

Robust fine-tuning of zero-shot models

[...]

Mitchell Wortsman, Gabriel Ilharco, Mike Li, Jong Wook Kim, Hannaneh Hajishirzi, Ali Farhadi, Hongseok Namkoong, Ludwig Schmidt¹ - Show less +4 more•Institutions (1)

University of Washington¹

04 Sep 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: Weight-space ensembles as mentioned in this paper ensembling the weights of the zero-shot and fine-tuned models provide large accuracy improvements out-of-distribution, while matching or improving in-disparity accuracy.

...read moreread less

Abstract: Large pre-trained models such as CLIP offer consistent accuracy across a range of data distributions when performing zero-shot inference (i.e., without fine-tuning on a specific dataset). Although existing fine-tuning approaches substantially improve accuracy in-distribution, they also reduce out-of-distribution robustness. We address this tension by introducing a simple and effective method for improving robustness: ensembling the weights of the zero-shot and fine-tuned models. Compared to standard fine-tuning, the resulting weight-space ensembles provide large accuracy improvements out-of-distribution, while matching or improving in-distribution accuracy. On ImageNet and five derived distribution shifts, weight-space ensembles improve out-of-distribution accuracy by 2 to 10 percentage points while increasing in-distribution accuracy by nearly 1 percentage point relative to standard fine-tuning. These improvements come at no additional computational cost during fine-tuning or inference.

...read moreread less

206 citations

Proceedings Article•10.1145/3540250.3549113•

No more fine-tuning? an experimental evaluation of prompt tuning in code intelligence

[...]

Chaozheng Wang, Yuanhang Yang, Cuiyun Gao, Yu Peng, Hongyu Zhang, Michael R. Lyu - Show less +2 more

24 Jul 2022

TL;DR:

...read moreread less

Abstract: Pre-trained models have been shown effective in many code intelligence tasks. These models are pre-trained on large-scale unlabeled corpus and then fine-tuned in downstream tasks. However, as the inputs to pre-training and downstream tasks are in different forms, it is hard to fully explore the knowledge of pre-trained models. Besides, the performance of fine-tuning strongly relies on the amount of downstream data, while in practice, the scenarios with scarce data are common. Recent studies in the natural language processing (NLP) field show that prompt tuning, a new paradigm for tuning, alleviates the above issues and achieves promising results in various NLP tasks. In prompt tuning, the prompts inserted during tuning provide task-specific knowledge, which is especially beneficial for tasks with relatively scarce data. In this paper, we empirically evaluate the usage and effect of prompt tuning in code intelligence tasks. We conduct prompt tuning on popular pre-trained models CodeBERT and CodeT5 and experiment with three code intelligence tasks including defect prediction, code summarization, and code translation. Our experimental results show that prompt tuning consistently outperforms fine-tuning in all three tasks. In addition, prompt tuning shows great potential in low-resource scenarios, e.g., improving the BLEU scores of fine-tuning by more than 26% on average for code summarization. Our results suggest that instead of fine-tuning, we could adapt prompt tuning for code intelligence tasks to achieve better performance, especially when lacking task-specific data.

...read moreread less

134 citations

...

Expand

Year	Papers
2025	29
2024	72
2023	128
2022	129
2021	8
2020	3

Topic Tools

Papers published on a yearly basis

Papers

P-Tuning: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks

Visual Prompt Tuning

Parameter-efficient fine-tuning of large-scale pre-trained language models

Robust fine-tuning of zero-shot models

No more fine-tuning? an experimental evaluation of prompt tuning in code intelligence

Related Topics (5)

Performance Metrics