通过最小对比编辑（小鼠）解释NLP模型

论文标题

通过最小对比编辑（小鼠）解释NLP模型

Explaining NLP Models via Minimal Contrastive Editing (MiCE)

论文作者

Ross, Alexis, Marasović, Ana, Peters, Matthew E.

论文摘要

人类已被证明给出了对比的解释，这解释了为什么观察到的事件发生，而不是其他反事实事件（对比案例）。尽管对比度在人类的解释中起着影响力的作用，但目前的解释NLP模型的方法主要缺少这种属性。我们提出了最小的对比编辑（小鼠），这是一种以编辑形式对模型预测的对比解释的方法，以将模型输出更改为对比度的输入的输入。我们在三个任务中进行的实验 - 二进制情感分类，主题分类和多项选择的问题回答 - 展示了小鼠能够产生不仅是对比度的编辑，而且还与人类对比的编辑一致。我们证明了如何在NLP系统开发中使用小鼠编辑 - 欺骗不正确的模型输出和发现数据集文物 - 因此说明产生对比度解释是模型可解释性的有希望的研究方向。

Humans have been shown to give contrastive explanations, which explain why an observed event happened rather than some other counterfactual event (the contrast case). Despite the influential role that contrastivity plays in how humans explain, this property is largely missing from current methods for explaining NLP models. We present Minimal Contrastive Editing (MiCE), a method for producing contrastive explanations of model predictions in the form of edits to inputs that change model outputs to the contrast case. Our experiments across three tasks--binary sentiment classification, topic classification, and multiple-choice question answering--show that MiCE is able to produce edits that are not only contrastive, but also minimal and fluent, consistent with human contrastive edits. We demonstrate how MiCE edits can be used for two use cases in NLP system development--debugging incorrect model outputs and uncovering dataset artifacts--and thereby illustrate that producing contrastive explanations is a promising research direction for model interpretability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题