论文标题
通过预测函数的概率Lipschitz度分析解释器的鲁棒性
Analyzing Explainer Robustness via Probabilistic Lipschitzness of Prediction Functions
论文作者
论文摘要
机器学习方法的预测能力已经显着提高,但与此同时,它们变得越来越复杂和透明。结果,通常依靠解释器为这些黑盒预测模型提供可解释性。作为关键的诊断工具,重要的是这些解释者本身很健壮。在本文中,我们关注鲁棒性的一个特定方面,即解释器应为类似的数据输入提供类似的解释。我们通过引入和定义解释者的敏锐度,类似于预测函数的敏锐性来形式化这个概念。我们的形式主义使我们能够将解释器的鲁棒性连接到预测指标的概率lipchitzness,从而捕获了局部平稳性的可能性。鉴于预测函数的lipschitz度,我们对各种解释者的敏锐性(例如,摇动,上升,cxplain)提供了较低的保证。这些理论上的结果表明,局部平稳的预测功能使自己可以局部强大的解释。我们对模拟和真实数据集进行经验评估这些结果。
Machine learning methods have significantly improved in their predictive capabilities, but at the same time they are becoming more complex and less transparent. As a result, explainers are often relied on to provide interpretability to these black-box prediction models. As crucial diagnostics tools, it is important that these explainers themselves are robust. In this paper we focus on one particular aspect of robustness, namely that an explainer should give similar explanations for similar data inputs. We formalize this notion by introducing and defining explainer astuteness, analogous to astuteness of prediction functions. Our formalism allows us to connect explainer robustness to the predictor's probabilistic Lipschitzness, which captures the probability of local smoothness of a function. We provide lower bound guarantees on the astuteness of a variety of explainers (e.g., SHAP, RISE, CXPlain) given the Lipschitzness of the prediction function. These theoretical results imply that locally smooth prediction functions lend themselves to locally robust explanations. We evaluate these results empirically on simulated as well as real datasets.