Plug-and-Play Document Modules for Pre-trained Models

doi:10.48550/arXiv.2305.17660

Proceedings Article10.48550/arXiv.2305.17660

Plug-and-Play Document Modules for Pre-trained Models

Chaojun Xiao, +9 more

- 28 May 2023

Vol. abs/2305.17660

4

TL;DR: Li et al. as discussed by the authors propose to represent each document as a plug-and-play document module, i.e., a document plugin, for pre-trained models (PTMs).

Abstract: Large-scale pre-trained models (PTMs) have been widely used in document-oriented NLP tasks, such as question answering. However, the encoding-task coupling requirement results in the repeated encoding of the same documents for different tasks and queries, which is highly computationally inefficient. To this end, we target to decouple document encoding from downstream tasks, and propose to represent each document as a plug-and-play document module, i.e., a document plugin, for PTMs (PlugD). By inserting document plugins into the backbone PTM for downstream tasks, we can encode a document one time to handle multiple tasks, which is more efficient than conventional encoding-task coupling methods that simultaneously encode documents and input queries using task-specific encoders. Extensive experiments on 8 datasets of 4 typical NLP tasks show that PlugD enables models to encode documents once and for all across different scenarios. Especially, PlugD can save 69% computational costs while achieving comparable performance to state-of-the-art encoding-task coupling methods. Additionally, we show that PlugD can serve as an effective post-processing way to inject knowledge into task-specific models, improving model performance without any additional model training. Our code and checkpoints can be found in https://github.com/thunlp/Document-Plugin.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Proceedings Article•10.48550/arXiv.2305.17331

Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In

Zichun Yu, +3 more

- 27 May 2023

TL;DR: In this article , an augmentation-adapted retriever (AAR) is proposed to assist target LMs that may not be known beforehand or are unable to be fine-tuned together.

...read moreread less

31

Journal Article•10.48550/arXiv.2305.17691

Plug-and-Play Knowledge Injection for Pre-trained Language Models

Zhengyan Zhang, +10 more

- 28 May 2023

- arXiv.org

TL;DR: In this paper , a plug-and-play knowledge injection method map-tuning is proposed, which trains a mapping of knowledge embeddings to enrich model inputs with mapped embedding while keeping model parameters frozen.

...read moreread less

6

Proceedings Article

Plug-and-Play Knowledge Injection for Pre-trained Language Models

Zhengyan Zhang, +10 more

TL;DR: In this paper , a plug-and-play knowledge injection method map-tuning is proposed, which trains a mapping of knowledge embeddings to enrich model inputs with mapped embedding while keeping model parameters frozen.

...read moreread less

Preprint•10.48550/arxiv.2406.02642

E-ICL: Enhancing Fine-Grained Emotion Recognition through the Lens of Prototype Theory

Zhen Yang, +8 more

- 04 Jun 2024

TL;DR: E-ICL enhances fine-grained emotion recognition by addressing the limitations of ICL based on prototype theory. It utilizes more accurate prototypes and an exclusionary emotion prediction strategy to improve accuracy and robustness.

...read moreread less

References

•Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- 12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

94.2K

•Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018

- arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

81.7K

Proceedings Article•10.3115/V1/D14-1162

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

- 01 Oct 2014

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

41.6K

•Posted Content

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu, +9 more

- 26 Jul 2019

- arXiv: Computation and Language

TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.

...read moreread less

26.2K

•Posted Content

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

- 16 Oct 2013

- arXiv: Computation and Language

TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.

...read moreread less

22.9K

...

Expand