论文标题
Grad-SAM:通过梯度自我发项图解释变压器
Grad-SAM: Explaining Transformers via Gradient Self-Attention Maps
论文作者
论文摘要
基于变压器的语言模型在许多语言任务中都显着提高了最新的。随着革命的继续,解释模型预测的能力已成为NLP社区的主要兴趣领域。在这项工作中,我们介绍了梯度自我发项图(Grad-SAM) - 一种基于梯度的新方法,可分析自我发项单元并识别最佳解释模型预测的输入元素。对各种基准测试的广泛评估表明,Grad-SAM比最先进的替代方案获得了重大改进。
Transformer-based language models significantly advanced the state-of-the-art in many linguistic tasks. As this revolution continues, the ability to explain model predictions has become a major area of interest for the NLP community. In this work, we present Gradient Self-Attention Maps (Grad-SAM) - a novel gradient-based method that analyzes self-attention units and identifies the input elements that explain the model's prediction the best. Extensive evaluations on various benchmarks show that Grad-SAM obtains significant improvements over state-of-the-art alternatives.