对抗性的不忠学习模型解释

论文标题

对抗性的不忠学习模型解释

Adversarial Infidelity Learning for Model Interpretation

论文作者

Liang, Jian, Bai, Bing, Cao, Yuren, Bai, Kun, Wang, Fei

论文摘要

模型解释在数据挖掘和知识发现中至关重要。它可以帮助理解固有的模型工作机制，并检查模型是否具有不希望的特征。执行模型解释的一种流行方式是实例特征选择（IFS），它提供了代表数据示例的每个功能的重要性得分，以解释模型如何生成特定输出。在本文中，我们提出了一个模型的有效直接直接（MEED）IFS模型解释的框架，减轻对理智，组合快捷方式，模型可识别性和信息传输的担忧。另外，我们关注以下设置：使用选定的功能直接预测给定模型的输出，该模型是模型解干方法的主要评估度量。除了功能外，我们还涉及给定模型的输出，作为基于更准确信息的解释程序的附加输入。为了学习解释者，除了忠诚之外，我们提出了一种对抗性的不忠学习（AIL）机制，以通过筛选相对不重要的特征来增强解释学习。通过理论和实验分析，我们表明我们的AIL机制可以帮助学习所选特征和目标之间所需的条件分布。此外，我们通过将有效的解释方法整合为适当的先验来提供温暖的开始，扩展了我们的框架。全面的经验评估结果由定量指标和人类评估提供，以证明我们提出的方法的有效性和优势。我们的代码可在https://github.com/langlrsw/meed上在线公开获取。

Model interpretation is essential in data mining and knowledge discovery. It can help understand the intrinsic model working mechanism and check if the model has undesired characteristics. A popular way of performing model interpretation is Instance-wise Feature Selection (IFS), which provides an importance score of each feature representing the data samples to explain how the model generates the specific output. In this paper, we propose a Model-agnostic Effective Efficient Direct (MEED) IFS framework for model interpretation, mitigating concerns about sanity, combinatorial shortcuts, model identifiability, and information transmission. Also, we focus on the following setting: using selected features to directly predict the output of the given model, which serves as a primary evaluation metric for model-interpretation methods. Apart from the features, we involve the output of the given model as an additional input to learn an explainer based on more accurate information. To learn the explainer, besides fidelity, we propose an Adversarial Infidelity Learning (AIL) mechanism to boost the explanation learning by screening relatively unimportant features. Through theoretical and experimental analysis, we show that our AIL mechanism can help learn the desired conditional distribution between selected features and targets. Moreover, we extend our framework by integrating efficient interpretation methods as proper priors to provide a warm start. Comprehensive empirical evaluation results are provided by quantitative metrics and human evaluation to demonstrate the effectiveness and superiority of our proposed method. Our code is publicly available online at https://github.com/langlrsw/MEED.

下载PDF全文

下载文献需遵守相关版权规定

论文标题