Grad-SAM：通过梯度自我发项图解释变压器

论文标题

Grad-SAM：通过梯度自我发项图解释变压器

Grad-SAM: Explaining Transformers via Gradient Self-Attention Maps

论文作者

Barkan, Oren, Hauon, Edan, Caciularu, Avi, Katz, Ori, Malkiel, Itzik, Armstrong, Omri, Koenigstein, Noam

论文摘要

基于变压器的语言模型在许多语言任务中都显着提高了最新的。随着革命的继续，解释模型预测的能力已成为NLP社区的主要兴趣领域。在这项工作中，我们介绍了梯度自我发项图（Grad-SAM） - 一种基于梯度的新方法，可分析自我发项单元并识别最佳解释模型预测的输入元素。对各种基准测试的广泛评估表明，Grad-SAM比最先进的替代方案获得了重大改进。

Transformer-based language models significantly advanced the state-of-the-art in many linguistic tasks. As this revolution continues, the ability to explain model predictions has become a major area of interest for the NLP community. In this work, we present Gradient Self-Attention Maps (Grad-SAM) - a novel gradient-based method that analyzes self-attention units and identifies the input elements that explain the model's prediction the best. Extensive evaluations on various benchmarks show that Grad-SAM obtains significant improvements over state-of-the-art alternatives.

下载PDF全文

下载文献需遵守相关版权规定

论文标题