论文标题
解释解释器:石灰的第一个理论分析
Explaining the Explainer: A First Theoretical Analysis of LIME
论文作者
论文摘要
机器学习越来越多地用于敏感应用,有时在关键的决策过程中取代了人类。因此,这些算法的解释性是紧迫的需求。一项提供可解释性的流行算法是石灰(局部可解释的模型不可解释的解释)。在本文中,我们提供了石灰的第一个理论分析。当要解释的函数是线性时,我们将得出可解释模型系数的封闭形式表达式。好消息是,这些系数与该功能的梯度成正比解释:石灰确实发现了有意义的特征。但是,我们的分析还表明,参数的不良选择可能导致石灰错过重要特征。
Machine learning is used more and more often for sensitive applications, sometimes replacing humans in critical decision-making processes. As such, interpretability of these algorithms is a pressing need. One popular algorithm to provide interpretability is LIME (Local Interpretable Model-Agnostic Explanation). In this paper, we provide the first theoretical analysis of LIME. We derive closed-form expressions for the coefficients of the interpretable model when the function to explain is linear. The good news is that these coefficients are proportional to the gradient of the function to explain: LIME indeed discovers meaningful features. However, our analysis also reveals that poor choices of parameters can lead LIME to miss important features.