论文标题
学习用概念瓶颈模型解释:减轻信息泄漏
Towards learning to explain with concept bottleneck models: mitigating information leakage
论文作者
论文摘要
概念瓶颈模型通过首先预测人类提供的概念列表中的哪个对数据点进行分类。然后,下游模型使用这些预测的概念标签来预测目标标签。预测的概念是目标预测的基本原理。当使用软概念标签时,模型信任问题出现在此范式中:以前已经观察到,有关数据分布的额外信息会泄漏到概念预测中。在这项工作中,我们展示了如何使用蒙特卡罗辍学物来获得不包含泄漏信息的软概念预测。
Concept bottleneck models perform classification by first predicting which of a list of human provided concepts are true about a datapoint. Then a downstream model uses these predicted concept labels to predict the target label. The predicted concepts act as a rationale for the target prediction. Model trust issues emerge in this paradigm when soft concept labels are used: it has previously been observed that extra information about the data distribution leaks into the concept predictions. In this work we show how Monte-Carlo Dropout can be used to attain soft concept predictions that do not contain leaked information.