事后概念瓶颈模型

论文标题

事后概念瓶颈模型

Post-hoc Concept Bottleneck Models

论文作者

Yuksekgonul, Mert, Wang, Maggie, Zou, James

论文摘要

概念瓶颈模型（CBM）将输入映射到一组可解释的概念（``瓶颈''）上，并使用概念来做出预测。一个概念瓶颈可以增强可解释性，因为可以研究它以了解该模型在输入中“看到”的概念，哪些概念被认为很重要。但是，CBM在实践中是限制性的，因为它们需要培训数据中的密集概念注释才能学习瓶颈。此外，CBM通常与不受限制的神经网络的准确性不符，从而减少了将其部署在实践中的动机。在这项工作中，我们通过引入事后概念瓶颈模型（PCBMS）来解决CBM的这些局限性。我们表明，我们可以将任何神经网络变成PCBM，而无需牺牲模型性能，同时仍保持可解释性优势。当培训数据上没有概念注释时，我们表明PCBM可以通过多模型从其他数据集或概念的自然语言描述转移概念。 PCBM的一个关键好处是，它使用户能够快速调试并更新模型，以减少虚假相关性并改善对新分布的概括。 PCBM允许进行全局模型编辑，这比以前在修复特定预测的本地干预措施上更有效。通过模型编辑的用户研究，我们表明，通过概念级反馈编辑PCBM可以提供显着的性能增长，而无需使用目标域或模型重新培训的数据。

Concept Bottleneck Models (CBMs) map the inputs onto a set of interpretable concepts (``the bottleneck'') and use the concepts to make predictions. A concept bottleneck enhances interpretability since it can be investigated to understand what concepts the model "sees" in an input and which of these concepts are deemed important. However, CBMs are restrictive in practice as they require dense concept annotations in the training data to learn the bottleneck. Moreover, CBMs often do not match the accuracy of an unrestricted neural network, reducing the incentive to deploy them in practice. In this work, we address these limitations of CBMs by introducing Post-hoc Concept Bottleneck models (PCBMs). We show that we can turn any neural network into a PCBM without sacrificing model performance while still retaining the interpretability benefits. When concept annotations are not available on the training data, we show that PCBM can transfer concepts from other datasets or from natural language descriptions of concepts via multimodal models. A key benefit of PCBM is that it enables users to quickly debug and update the model to reduce spurious correlations and improve generalization to new distributions. PCBM allows for global model edits, which can be more efficient than previous works on local interventions that fix a specific prediction. Through a model-editing user study, we show that editing PCBMs via concept-level feedback can provide significant performance gains without using data from the target domain or model retraining.

下载PDF全文

下载文献需遵守相关版权规定

论文标题