我应该选择哪种解释？表征事后解释的函数近似近似的角度

论文标题

我应该选择哪种解释？表征事后解释的函数近似近似的角度

Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post Hoc Explanations

论文作者

Han, Tessa, Srinivas, Suraj, Lakkaraju, Himabindu

论文摘要

事后解释性领域的一个关键问题是方法之间缺乏共同的基础目标。例如，某些方法是由函数近似动机，有些是通过游戏理论概念而动机的，而有些方法是通过获得干净的可视化来激励的。目标的分裂不仅会导致对解释的概念不一致的理解，而且导致不知道什么时候使用哪种方法的实际挑战。在这项工作中，我们开始通过统一八种流行的事后解释方法来解决这些挑战（石灰，c-lime，kernelshap，遮挡，遮挡，香草梯度，梯度X输入，平滑级和集成梯度）。我们表明，这些方法都执行黑框模型的局部函数近似，仅在用于执行近似的邻域和损耗函数方面有所不同。该统一使我们能够（1）陈述无免费的午餐定理以说明方法，表明没有任何方法可以在所有社区中最佳地执行，并且（2）提供指导原则，以基于对黑箱模型的忠诚度进行选择。我们使用各种现实世界数据集，模型类和预测任务来验证这些理论结果。通过将各种解释方法带入一个共同的框架，这项工作（1）提高了对这些方法的概念理解，揭示了他们共享的本地函数近似目标，属性和彼此之间的关系，并且（2）指导在实践中使用这些方法，为在方法中提供选择的原则方法，并为创建新的方法提供了选择。

A critical problem in the field of post hoc explainability is the lack of a common foundational goal among methods. For example, some methods are motivated by function approximation, some by game theoretic notions, and some by obtaining clean visualizations. This fragmentation of goals causes not only an inconsistent conceptual understanding of explanations but also the practical challenge of not knowing which method to use when. In this work, we begin to address these challenges by unifying eight popular post hoc explanation methods (LIME, C-LIME, KernelSHAP, Occlusion, Vanilla Gradients, Gradients x Input, SmoothGrad, and Integrated Gradients). We show that these methods all perform local function approximation of the black-box model, differing only in the neighbourhood and loss function used to perform the approximation. This unification enables us to (1) state a no free lunch theorem for explanation methods, demonstrating that no method can perform optimally across all neighbourhoods, and (2) provide a guiding principle to choose among methods based on faithfulness to the black-box model. We empirically validate these theoretical results using various real-world datasets, model classes, and prediction tasks. By bringing diverse explanation methods into a common framework, this work (1) advances the conceptual understanding of these methods, revealing their shared local function approximation objective, properties, and relation to one another, and (2) guides the use of these methods in practice, providing a principled approach to choose among methods and paving the way for the creation of new ones.

下载PDF全文

下载文献需遵守相关版权规定

论文标题