概念激活区域：基于概念的解释的广义框架

论文标题

概念激活区域：基于概念的解释的广义框架

Concept Activation Regions: A Generalized Framework For Concept-Based Explanations

论文作者

Crabbé, Jonathan, van der Schaar, Mihaela

论文摘要

基于概念的解释允许通过用户指定的概念镜头来了解深神经网络（DNN）的预测。现有方法假设说明概念的示例是在DNN潜在空间的固定方向上映射的。当这是正确的时候，该概念可以用指向该方向的概念激活向量（CAV）表示。在这项工作中，我们建议通过允许概念示例散布在DNN潜在空间中的不同集群中，以放松这一假设。然后，每个概念都由DNN潜在空间的一个区域表示，该区域包括这些簇，我们称为概念激活区域（CAR）。为了使这个想法形式化，我们介绍了基于内核技巧和支持向量分类器的骑士形式主义的扩展。这种汽车形式主义产生了基于全球概念的解释和基于本地概念的特征重要性。我们证明，使用径向内核建造的汽车解释在潜在空间等法下是不变的。这样，汽车将相同的解释分配给具有相同几何形状的潜在空间。我们进一步证明汽车提供（1）更准确地描述了概念如何散布在DNN的潜在空间中；（2）与人类概念注释更接近的全球解释，以及（3）基于概念的特征重要性，这些特征的重要性有意义地相互关联。最后，我们使用汽车表明DNN可以自主重新发现已知的科学概念，例如前列腺癌分级系统。

Concept-based explanations permit to understand the predictions of a deep neural network (DNN) through the lens of concepts specified by users. Existing methods assume that the examples illustrating a concept are mapped in a fixed direction of the DNN's latent space. When this holds true, the concept can be represented by a concept activation vector (CAV) pointing in that direction. In this work, we propose to relax this assumption by allowing concept examples to be scattered across different clusters in the DNN's latent space. Each concept is then represented by a region of the DNN's latent space that includes these clusters and that we call concept activation region (CAR). To formalize this idea, we introduce an extension of the CAV formalism that is based on the kernel trick and support vector classifiers. This CAR formalism yields global concept-based explanations and local concept-based feature importance. We prove that CAR explanations built with radial kernels are invariant under latent space isometries. In this way, CAR assigns the same explanations to latent spaces that have the same geometry. We further demonstrate empirically that CARs offer (1) more accurate descriptions of how concepts are scattered in the DNN's latent space; (2) global explanations that are closer to human concept annotations and (3) concept-based feature importance that meaningfully relate concepts with each other. Finally, we use CARs to show that DNNs can autonomously rediscover known scientific concepts, such as the prostate cancer grading system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题