论文标题
无标记的无监督模型的可解释性
Label-Free Explainability for Unsupervised Models
论文作者
论文摘要
无监督的黑盒模型在解释方面具有挑战性。实际上,大多数现有的解释性方法都要求标签来选择要解释的黑框输出的组件。在没有标签的情况下,黑框输出通常是表示向量,其组件不对应于任何有意义的数量。因此,选择哪些组件在无标签的无监督/自我监督的设置中是一个重要但未解决的问题。为了弥合文献中的这一差距,我们介绍了两个关键的事后解释技术的关键扩展:(1)无标签的特征重要性以及(2)无标签的示例重要性,分别强调了在推理时间构建表征的黑箱的有影响力的特征和训练示例。我们证明,我们的扩展可以成功实现,以围绕许多现有功能和示例重要性方法的简单包装器实现。我们通过定性和定量的比较来说明我们无标记的解释性范式的实用性,该范式比较了由经过不同无监督任务训练的各种自动编码器学到的表示空间。
Unsupervised black-box models are challenging to interpret. Indeed, most existing explainability methods require labels to select which component(s) of the black-box's output to interpret. In the absence of labels, black-box outputs often are representation vectors whose components do not correspond to any meaningful quantity. Hence, choosing which component(s) to interpret in a label-free unsupervised/self-supervised setting is an important, yet unsolved problem. To bridge this gap in the literature, we introduce two crucial extensions of post-hoc explanation techniques: (1) label-free feature importance and (2) label-free example importance that respectively highlight influential features and training examples for a black-box to construct representations at inference time. We demonstrate that our extensions can be successfully implemented as simple wrappers around many existing feature and example importance methods. We illustrate the utility of our label-free explainability paradigm through a qualitative and quantitative comparison of representation spaces learned by various autoencoders trained on distinct unsupervised tasks.