ConceptDistil：概念解释的模型 - 不足蒸馏

论文标题

ConceptDistil：概念解释的模型 - 不足蒸馏

ConceptDistil: Model-Agnostic Distillation of Concept Explanations

论文作者

Sousa, João Bento, Moreira, Ricardo, Balayan, Vladimir, Saleiro, Pedro, Bizarro, Pedro

论文摘要

基于概念的解释旨在填补非技术人类的模型可解释性差距。先前的工作重点是为特定模型（例如神经网络）或数据类型（例如图像）提供概念，并通过尝试通过多任务学习从已训练的网络中提取概念，或者通过多任务学习来提取概念。在这项工作中，我们提出了概念限制，这是一种使用知识蒸馏为任何黑盒分类器带来概念解释的方法。 ConceptDistil分解为两个组成部分：（1）一个概念模型，该概念模型可以预测给定实例中存在哪些域概念，并且（2）试图使用概念模型预测来模仿黑盒模型的预测。我们在现实世界中验证了概念限制，这表明它能够优化这两个任务，从而为任何黑色框模型带来概念 - 解释性。

Concept-based explanations aims to fill the model interpretability gap for non-technical humans-in-the-loop. Previous work has focused on providing concepts for specific models (eg, neural networks) or data types (eg, images), and by either trying to extract concepts from an already trained network or training self-explainable models through multi-task learning. In this work, we propose ConceptDistil, a method to bring concept explanations to any black-box classifier using knowledge distillation. ConceptDistil is decomposed into two components:(1) a concept model that predicts which domain concepts are present in a given instance, and (2) a distillation model that tries to mimic the predictions of a black-box model using the concept model predictions. We validate ConceptDistil in a real world use-case, showing that it is able to optimize both tasks, bringing concept-explainability to any black-box model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题