论文标题
解释是否公平评估?定义驱动的管道,用于事后解释性
Are Interpretations Fairly Evaluated? A Definition Driven Pipeline for Post-Hoc Interpretability
论文作者
论文摘要
近年来,正在开发越来越多的解释方法来提高NLP模型的透明度。同时,研究人员还试图回答以下问题:所获得的解释是否忠实地解释了模型预测背后的机制?具体而言,(Jain和Wallace,2019年)提出,通过将注意力的解释与梯度替代方案进行比较,“注意不是解释”。但是,它提出了一个新问题,我们可以安全地选择一种解释方法作为基础真相吗?如果没有,我们可以在什么基础上比较不同的解释方法?在这项工作中,我们建议在评估解释的忠诚之前对解释进行具体的解释定义至关重要。该定义将影响算法以获取解释,更重要的是,在评估中使用的指标。通过理论和实验分析,我们发现,尽管解释方法在一定的评估度量下的性能有所不同,但这种差异可能不是由解释质量或忠诚而导致的,而是评估度量指标的固有偏见。
Recent years have witnessed an increasing number of interpretation methods being developed for improving transparency of NLP models. Meanwhile, researchers also try to answer the question that whether the obtained interpretation is faithful in explaining mechanisms behind model prediction? Specifically, (Jain and Wallace, 2019) proposes that "attention is not explanation" by comparing attention interpretation with gradient alternatives. However, it raises a new question that can we safely pick one interpretation method as the ground-truth? If not, on what basis can we compare different interpretation methods? In this work, we propose that it is crucial to have a concrete definition of interpretation before we could evaluate faithfulness of an interpretation. The definition will affect both the algorithm to obtain interpretation and, more importantly, the metric used in evaluation. Through both theoretical and experimental analysis, we find that although interpretation methods perform differently under a certain evaluation metric, such a difference may not result from interpretation quality or faithfulness, but rather the inherent bias of the evaluation metric.