Revisiting LRP: Positional Attribution as the Missing Ingredient for Transformer Explainability
Abstract
A specialized LRP method for Transformer explainability considers positional encoding, improving relevance propagation and outperforming existing methods.
The development of effective explainability tools for Transformers is a crucial pursuit in deep learning research. One of the most promising approaches in this domain is Layer-wise Relevance Propagation (LRP), which propagates relevance scores backward through the network to the input space by redistributing activation values based on predefined rules. However, existing LRP-based methods for Transformer explainability entirely overlook a critical component of the Transformer architecture: its positional encoding (PE), resulting in violation of the conservation property, and the loss of an important and unique type of relevance, which is also associated with structural and positional features. To address this limitation, we reformulate the input space for Transformer explainability as a set of position-token pairs. This allows us to propose specialized theoretically-grounded LRP rules designed to propagate attributions across various positional encoding methods, including Rotary, Learnable, and Absolute PE. Extensive experiments with both fine-tuned classifiers and zero-shot foundation models, such as LLaMA 3, demonstrate that our method significantly outperforms the state-of-the-art in both vision and NLP explainability tasks. Our code is publicly available.
Community
The development of effective explainability tools for Transformers is a crucial pursuit in deep learning research. One of the most promising approaches in this domain is Layer-wise Relevance Propagation (LRP), which propagates relevance scores backward through the network to the input space by redistributing activation values based on predefined rules. However, existing LRP-based methods for
Transformer explainability entirely overlook a critical component of the Transformer architecture: its positional encoding (PE), resulting in violation of the conservation property, and the loss of an important and unique type of relevance, which is also associated with structural and positional features. To address this limitation, we reformulate the input space for Transformer explainability as a set of position-token pairs. This allows us to propose specialized theoretically-grounded LRP rules designed to propagate attributions across various positional encoding methods, including Rotary, Learnable, and Absolute PE. Extensive experiments with both fine-tuned classifiers and zero-shot foundation models, such as LLaMA
3, demonstrate that our method significantly outperforms the state-of-the-art in
both vision and NLP explainability tasks. Our code is publicly available.
Hi!
We present a state-of-the-art attribution method for Transformers and LLMs, providing significantly more faithful explanations than previous methods, particularly for concepts associated with structural and positional features. A user-friendly, open-source implementation with XAI demos is available at https://github.com/YardenBakish/PE-AWARE-LRP.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- GMAR: Gradient-Driven Multi-Head Attention Rollout for Vision Transformer Interpretability (2025)
- ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention (2025)
- Is Attention Required for Transformer Inference? Explore Function-preserving Attention Replacement (2025)
- Empirical Evaluation of Knowledge Distillation from Transformers to Subquadratic Language Models (2025)
- PaTH Attention: Position Encoding via Accumulating Householder Transformations (2025)
- Enhancing Transformers Through Conditioned Embedded Tokens (2025)
- UMoE: Unifying Attention and FFN with Shared Experts (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper