论文标题
关于室外设置中解释的实证研究
An Empirical Study on Explanations in Out-of-Domain Settings
论文作者
论文摘要
自然语言处理中的最新工作集中在开发忠实解释的方法上,要么是通过识别输入中最重要的令牌(即事后解释)或设计固有的忠实忠实的模型,然后首先选择最重要的标记,然后使用它们来预测正确的标签(即精选的预测模型)。当前,这些方法在大量域内进行了评估。然而,关于事后解释和固有的忠实模型在室外设置中的表现知之甚少。在本文中,我们进行了一项广泛的实证研究,研究了:(1)五个特征归因方法产生的事后解释的跨域忠诚; (2)六个数据集上两个天生忠实的模型的外域性能。与我们的期望相反,结果表明,在许多情况下,与内域相比,通过充足和全面性衡量的事后解释忠诚度更高。我们发现这种误导性,并建议使用随机基线作为评估事后解释忠诚的标准。我们的发现还表明,精选的预测模型表明,室外设置与全文训练的模型相当的预测性能。
Recent work in Natural Language Processing has focused on developing approaches that extract faithful explanations, either via identifying the most important tokens in the input (i.e. post-hoc explanations) or by designing inherently faithful models that first select the most important tokens and then use them to predict the correct label (i.e. select-then-predict models). Currently, these approaches are largely evaluated on in-domain settings. Yet, little is known about how post-hoc explanations and inherently faithful models perform in out-of-domain settings. In this paper, we conduct an extensive empirical study that examines: (1) the out-of-domain faithfulness of post-hoc explanations, generated by five feature attribution methods; and (2) the out-of-domain performance of two inherently faithful models over six datasets. Contrary to our expectations, results show that in many cases out-of-domain post-hoc explanation faithfulness measured by sufficiency and comprehensiveness is higher compared to in-domain. We find this misleading and suggest using a random baseline as a yardstick for evaluating post-hoc explanation faithfulness. Our findings also show that select-then predict models demonstrate comparable predictive performance in out-of-domain settings to full-text trained models.