Proceedings Article10.48550/arXiv.2305.17660
Plug-and-Play Document Modules for Pre-trained Models
Chaojun Xiao,Zhengyan Zhang,Xu Han,Chi-Min Chan,Yankai Lin,Zhiyuan Liu,Xiangyang Li,Zhonghua Li,Zhao Cao,Maosong Sun +9 more
- 28 May 2023
Vol. abs/2305.17660
TL;DR: Li et al. as discussed by the authors propose to represent each document as a plug-and-play document module, i.e., a document plugin, for pre-trained models (PTMs).
read more
Abstract: Large-scale pre-trained models (PTMs) have been widely used in document-oriented NLP tasks, such as question answering. However, the encoding-task coupling requirement results in the repeated encoding of the same documents for different tasks and queries, which is highly computationally inefficient. To this end, we target to decouple document encoding from downstream tasks, and propose to represent each document as a plug-and-play document module, i.e., a document plugin, for PTMs (PlugD). By inserting document plugins into the backbone PTM for downstream tasks, we can encode a document one time to handle multiple tasks, which is more efficient than conventional encoding-task coupling methods that simultaneously encode documents and input queries using task-specific encoders. Extensive experiments on 8 datasets of 4 typical NLP tasks show that PlugD enables models to encode documents once and for all across different scenarios. Especially, PlugD can save 69% computational costs while achieving comparable performance to state-of-the-art encoding-task coupling methods. Additionally, we show that PlugD can serve as an effective post-processing way to inject knowledge into task-specific models, improving model performance without any additional model training. Our code and checkpoints can be found in https://github.com/thunlp/Document-Plugin.
read more
Chat with Paper
AI Agents for this Paper
Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps
Citations
Augmentation-Adapted Retriever Improves Generalization of Language Models as Generic Plug-In
Zichun Yu,Chenyan Xiong,Shi Yu,Zhiyuan Liu +3 more
- 27 May 2023
TL;DR: In this article , an augmentation-adapted retriever (AAR) is proposed to assist target LMs that may not be known beforehand or are unable to be fine-tuned together.
31
Plug-and-Play Knowledge Injection for Pre-trained Language Models
Zhengyan Zhang,Zhiyuan Zeng,Yankai Lin,Huadong Wang,Deming Ye,Chaojun Xiao,Xu Han,Zhiyuan Liu,Peng Li,Maosong Sun,Jie Zhou +10 more
TL;DR: In this paper , a plug-and-play knowledge injection method map-tuning is proposed, which trains a mapping of knowledge embeddings to enrich model inputs with mapped embedding while keeping model parameters frozen.
Proceedings Article
Plug-and-Play Knowledge Injection for Pre-trained Language Models
Zhengyan Zhang,Zhiyuan Zeng,Yankai Lin,Huadong Wang,Deming Ye,Chaojun Xiao,Xu Han,Zhiyuan Liu,Peng Fei Li,Maosong Sun,Jie Zhou +10 more
TL;DR: In this paper , a plug-and-play knowledge injection method map-tuning is proposed, which trains a mapping of knowledge embeddings to enrich model inputs with mapped embedding while keeping model parameters frozen.
E-ICL: Enhancing Fine-Grained Emotion Recognition through the Lens of Prototype Theory
Zhen Yang,Ren Zhang,C. Ye,Yufeng Wang,Hongwei Sun,Chao Chen,Xiaofei Zhu,Yunbing Wu,Xiangwen Liao +8 more
- 04 Jun 2024
TL;DR: E-ICL enhances fine-grained emotion recognition by addressing the limitations of ICL based on prototype theory. It utilizes more accurate prototypes and an exclusionary emotion prediction strategy to improve accuracy and robustness.
References
•Proceedings Article
Attention is All you Need
Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Lukasz Kaiser,Illia Polosukhin +7 more
- 12 Jun 2017
TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
•Posted Content
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
81.7K
Glove: Global Vectors for Word Representation
Jeffrey Pennington,Richard Socher,Christopher D. Manning +2 more
- 01 Oct 2014
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
•Posted Content
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu,Myle Ott,Naman Goyal,Jingfei Du,Mandar Joshi,Danqi Chen,Omer Levy,Michael Lewis,Luke Zettlemoyer,Veselin Stoyanov +9 more
TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.
•Posted Content
Distributed Representations of Words and Phrases and their Compositionality
TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.