GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers

doi:10.48550/arXiv.2205.03286

Proceedings Article10.48550/arXiv.2205.03286

GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers

A. Modarressi, +3 more

- 06 May 2022

Vol. abs/2205.03286

22

TL;DR: A novel token attribution analysis method that incorporates all the components in the encoder block and aggregates this throughout layers and significantly outperforms previous methods on various tasks regarding correlation with gradient-based saliency scores.

Chat with Paper

AI Agents for this Paper

Find similar papers on Google Scholar, PubMed and Arxiv
Write a critical review of this paper
Analyze citations of this paper to find unaddressed research gaps

Citations

Journal Article•10.48550/arxiv.2401.12874

From Understanding to Utilization: A Survey on Explainability for Large Language Models

Haoyan Luo, +1 more

- 23 Jan 2024

- arXiv.org

TL;DR: This survey underscores the imperative for increased explainability in LLMs, delving into both the research on explainability and the various methodologies and tasks that utilize an understanding of these models.

...read moreread less

16

Proceedings Article•10.48550/arXiv.2301.12971

Quantifying Context Mixing in Transformers

Hosein Mohebbi, +3 more

- 30 Jan 2023

TL;DR: The authors propose Value Zeroing, a context mixing score customized for Transformers that provides a deeper understanding of how information is mixed at each encoder layer, and demonstrate the superiority of their context mixing scores over other analysis methods through a series of complementary evaluations with different viewpoints based on linguistically informed rationales, probing, and faithfulness analysis.

...read moreread less

14

•Posted Content•10.48550/arxiv.2302.00456

Feed-Forward Blocks Control Contextualization in Masked Language Models

01 Feb 2023

TL;DR: This article analyzed the inner contextualization of Transformer-based models by considering all the components, including the feed-forward block and its surrounding residual and normalization layers, as well as the attention.

...read moreread less

12

Proceedings Article•10.48550/arXiv.2306.02873

DecompX: Explaining Transformers Decisions by Propagating Token Decomposition

A. Modarressi, +4 more

- 05 Jun 2023

TL;DR: DecompX as mentioned in this paper is based on the construction of decomposed token representations and their successive propagation throughout the model without mixing them in between layers, which provides multiple advantages over existing solutions for its inclusion of all encoder components and the classification head.

...read moreread less

9

Journal Article•10.47852/bonviewjdsis32021131

Establishing An Optimal Online Phishing Detection Method: Evaluating Topological NLP Transformers on Text Message Data

Helen Milner

- 12 Jul 2023

TL;DR: In this article , an optimal classification model for online SMS spam detection by utilizing topological sentence transformer methodologies was established. Butler et al. presented a viable lightweight integration of pre-trained NLP repository models with sklearn functionality, which achieved an optimal F1-score of 0.938.

...read moreread less

5

...

Expand

References

•Proceedings Article

Attention is All you Need

Ashish Vaswani, +7 more

- 12 Jun 2017

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.

...read moreread less

94.2K

•Posted Content

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, +3 more

- 11 Oct 2018

- arXiv: Computation and Language

TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.

...read moreread less

81.7K

•Proceedings Article

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

Richard Socher, +6 more

- 01 Oct 2013

TL;DR: A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.

...read moreread less

8.8K

•Proceedings Article

Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps

Karen Simonyan, +2 more

- 23 Dec 2013

TL;DR: In this paper, the gradient of the class score with respect to the input image is computed to compute a class saliency map, which can be used for weakly supervised object segmentation using classification ConvNets.

...read moreread less

7.5K

•Posted Content

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Victor Sanh, +3 more

- 02 Oct 2019

- arXiv: Computation and Language

TL;DR: This work proposes a method to pre-train a smaller general-purpose language representation model, called DistilBERT, which can be fine-tuned with good performances on a wide range of tasks like its larger counterparts, and introduces a triple loss combining language modeling, distillation and cosine-distance losses.

...read moreread less

7.3K

...

Expand