使用反事实方法来解释分类器 - 发现数据偏见的全球解释

论文标题

使用反事实方法来解释分类器 - 发现数据偏见的全球解释

Towards explainable classifiers using the counterfactual approach -- global explanations for discovering bias in data

论文作者

Mikołajczyk, Agnieszka, Grochowski, Michał, Kwasigroch, Arkadiusz

论文摘要

本文提出了概述的基于归因的事后解释，以检测和鉴定数据中的偏差。提出了一个全球解释，并逐步引入了如何检测和测试偏见。由于消除不必要的偏见通常是一项复杂且巨大的任务，因此它会自动插入。然后，通过建议的反事实方法评估偏差。在样品皮肤病变数据集上验证了所获得的结果。使用所提出的方法，在皮肤镜检查图像中成功识别并确认了许多可能导致伪影的可能偏差。特别是，确认黑色框架对卷积神经网络的预测有很大的影响：其中22％将预测从良性变为恶性。

The paper proposes summarized attribution-based post-hoc explanations for the detection and identification of bias in data. A global explanation is proposed, and a step-by-step framework on how to detect and test bias is introduced. Since removing unwanted bias is often a complicated and tremendous task, it is automatically inserted, instead. Then, the bias is evaluated with the proposed counterfactual approach. The obtained results are validated on a sample skin lesion dataset. Using the proposed method, a number of possible bias causing artifacts are successfully identified and confirmed in dermoscopy images. In particular, it is confirmed that black frames have a strong influence on Convolutional Neural Network's prediction: 22% of them changed the prediction from benign to malignant.

下载PDF全文

下载文献需遵守相关版权规定

论文标题