论文标题
上下文语义解释性
Contextual Semantic Interpretability
论文作者
论文摘要
众所周知,卷积神经网络(CNN)学习了一种捕获与任务相关的概念的图像表示,但以隐式的方式进行操作,以增强模型的解释性。但是,人们可能会说,这种表示形式隐藏在神经元中,可以通过教导模型识别现场中存在的语义上可解释的属性来显式。我们将这样的中间层称为\ emph {语义瓶颈}。一旦学习了属性,就可以重新组合它们以达成最终决定,并提供准确的预测和CNN决定背后的明确推理。在本文中,我们研究了捕获上下文的语义瓶颈:我们希望属性属于一些有意义的元素的群体,并共同参与最终决定。我们使用两层语义瓶颈,将属性聚集到可解释的稀疏组中,使它们对最终输出的贡献不同,具体取决于上下文。我们在景观风景估计的任务上测试上下文语义上的可解释瓶颈(CSIB),并使用辅助数据库(SUN属性)训练语义可解释的瓶颈。当应用于现实世界的Flickr图像集,同时为每个预测提供了清晰可解释的解释,我们的模型在预测中的预测与非解剖基线一样准确。
Convolutional neural networks (CNN) are known to learn an image representation that captures concepts relevant to the task, but do so in an implicit way that hampers model interpretability. However, one could argue that such a representation is hidden in the neurons and can be made explicit by teaching the model to recognize semantically interpretable attributes that are present in the scene. We call such an intermediate layer a \emph{semantic bottleneck}. Once the attributes are learned, they can be re-combined to reach the final decision and provide both an accurate prediction and an explicit reasoning behind the CNN decision. In this paper, we look into semantic bottlenecks that capture context: we want attributes to be in groups of a few meaningful elements and participate jointly to the final decision. We use a two-layer semantic bottleneck that gathers attributes into interpretable, sparse groups, allowing them contribute differently to the final output depending on the context. We test our contextual semantic interpretable bottleneck (CSIB) on the task of landscape scenicness estimation and train the semantic interpretable bottleneck using an auxiliary database (SUN Attributes). Our model yields in predictions as accurate as a non-interpretable baseline when applied to a real-world test set of Flickr images, all while providing clear and interpretable explanations for each prediction.