论文标题
解释潜在表示的不可逆转解释网络
A Disentangling Invertible Interpretation Network for Explaining Latent Representations
论文作者
论文摘要
通过学习输入数据的强大表示,神经网络在计算机视觉中的性能极大地提高了性能。端到端训练的最大整体性能的缺点是黑盒模型,其隐藏表示性缺乏可解释性:因为分布式编码对于潜在层提高其稳健性是最佳的,将其含义归因于隐藏特征向量的部分或单个神经元的一部分。我们将解释作为将隐藏表示形式翻译成用户可以理解的语义概念的翻译。两个域之间的映射必须是徒的,以便目标域中的语义修改正确地改变了原始表示。所提出的可逆解释网络可以透明地应用于现有体系结构的顶部,而无需修改或重新训练。因此,我们将原始表示形式转化为等效但可解释的,而不影响原始的表现力和性能。可逆的解释网络将隐藏的表示形式分解为单独的,语义上有意义的概念。此外,我们提出了一种有效的方法来定义语义概念,仅绘制两个图像和无监督的策略。实验评估证明了对现有分类和图像生成网络的解释以及语义引导的图像操纵的广泛适用性。
Neural networks have greatly boosted performance in computer vision by learning powerful representations of input data. The drawback of end-to-end training for maximal overall performance are black-box models whose hidden representations are lacking interpretability: Since distributed coding is optimal for latent layers to improve their robustness, attributing meaning to parts of a hidden feature vector or to individual neurons is hindered. We formulate interpretation as a translation of hidden representations onto semantic concepts that are comprehensible to the user. The mapping between both domains has to be bijective so that semantic modifications in the target domain correctly alter the original representation. The proposed invertible interpretation network can be transparently applied on top of existing architectures with no need to modify or retrain them. Consequently, we translate an original representation to an equivalent yet interpretable one and backwards without affecting the expressiveness and performance of the original. The invertible interpretation network disentangles the hidden representation into separate, semantically meaningful concepts. Moreover, we present an efficient approach to define semantic concepts by only sketching two images and also an unsupervised strategy. Experimental evaluation demonstrates the wide applicability to interpretation of existing classification and image generation networks as well as to semantically guided image manipulation.