雷克斯：推理意识和扎根的解释

论文标题

雷克斯：推理意识和扎根的解释

REX: Reasoning-aware and Grounded Explanation

论文作者

Chen, Shi, Zhao, Qi

论文摘要

有效性和解释性是值得信赖的AI系统的两个基本特性。最新的视觉推理研究致力于提高预测答案的准确性，并且对解释决策背后的理由的关注更少。结果，他们通常会利用虚假的偏见，而不是实际上对视觉文本数据进行推理，并尚未通过考虑来自两种模式的关键信息来解释其决策的能力。本文的目的是从三个不同的角度缩小差距：首先，我们定义了一种新型的多模式解释，通过逐步穿越图像中的推理过程和接地关键字来解释决策。我们开发了一个功能程序，以顺序执行不同的推理步骤，并使用1,040,830个多模式说明构建新数据集。其次，我们确定了在视觉和文本方式上紧密融合重要组成部分以解释决策的关键需求，并提出了一种新颖的解释生成方法，该方法明确地模拟了单词和感兴趣区域之间的成对对应关系。它通过相当大的余量提高了视觉接地能力，从而提高了解释性和推理性能。最后，借助我们的新数据和方法，我们进行了广泛的分析，以研究不同环境下的解释的有效性，包括多任务学习和转移学习。我们的代码和数据可在https://github.com/szzexpoi/rex上找到。

Effectiveness and interpretability are two essential properties for trustworthy AI systems. Most recent studies in visual reasoning are dedicated to improving the accuracy of predicted answers, and less attention is paid to explaining the rationales behind the decisions. As a result, they commonly take advantage of spurious biases instead of actually reasoning on the visual-textual data, and have yet developed the capability to explain their decision making by considering key information from both modalities. This paper aims to close the gap from three distinct perspectives: first, we define a new type of multi-modal explanations that explain the decisions by progressively traversing the reasoning process and grounding keywords in the images. We develop a functional program to sequentially execute different reasoning steps and construct a new dataset with 1,040,830 multi-modal explanations. Second, we identify the critical need to tightly couple important components across the visual and textual modalities for explaining the decisions, and propose a novel explanation generation method that explicitly models the pairwise correspondence between words and regions of interest. It improves the visual grounding capability by a considerable margin, resulting in enhanced interpretability and reasoning performance. Finally, with our new data and method, we perform extensive analyses to study the effectiveness of our explanation under different settings, including multi-task learning and transfer learning. Our code and data are available at https://github.com/szzexpoi/rex.

下载PDF全文

下载文献需遵守相关版权规定

论文标题