GraphFramex：朝着系统评估图形神经网络的解释性方法

论文标题

GraphFramex：朝着系统评估图形神经网络的解释性方法

GraphFramEx: Towards Systematic Evaluation of Explainability Methods for Graph Neural Networks

论文作者

Amara, Kenza, Ying, Rex, Zhang, Zitao, Han, Zhihao, Shan, Yinan, Brandes, Ulrik, Schemm, Sebastian, Zhang, Ce

论文摘要

作为当今最受欢迎的机器学习模型之一，Graph神经网络（GNN）最近引起了激烈的兴趣，其解释性也引起了人们的兴趣。用户对更好地了解GNN模型及其结果越来越感兴趣。不幸的是，当今针对GNN解释性的评估框架通常依赖于合成数据集不足，从而得出了由于问题实例缺乏复杂性而得出有限范围的结论。由于GNN模型被部署到更关键的任务应用程序中，因此我们迫切需要使用GNN解释性方法的共同评估协议。在本文中，据我们所知，我们提出了GNN解释性的第一个系统评估框架，考虑到三种不同的“用户需求”的解释性。我们提出了一个独特的指标，该指标结合了忠诚度措施，并根据其足够或必要的质量对解释进行了分类。我们将自己范围用于节点分类任务，并比较GNN的输入级解释性领域中最具代表性的技术。对于不足但广泛使用的合成基准测试，令人惊讶的是浅水技术（例如个性化的Pagerank）在最少的计算时间内具有最佳性能。但是，当图形结构更加复杂并且节点具有有意义的特征时，根据我们的评估标准，基于梯度的方法是最好的。但是，没有人在所有评估维度上主导其他人，而且总会有一个权衡。我们在案例研究中进一步应用了我们的评估协议，以在eBay交易图上进行欺诈解释，以反映生产环境。

As one of the most popular machine learning models today, graph neural networks (GNNs) have attracted intense interest recently, and so does their explainability. Users are increasingly interested in a better understanding of GNN models and their outcomes. Unfortunately, today's evaluation frameworks for GNN explainability often rely on few inadequate synthetic datasets, leading to conclusions of limited scope due to a lack of complexity in the problem instances. As GNN models are deployed to more mission-critical applications, we are in dire need for a common evaluation protocol of explainability methods of GNNs. In this paper, we propose, to our best knowledge, the first systematic evaluation framework for GNN explainability, considering explainability on three different "user needs". We propose a unique metric that combines the fidelity measures and classifies explanations based on their quality of being sufficient or necessary. We scope ourselves to node classification tasks and compare the most representative techniques in the field of input-level explainability for GNNs. For the inadequate but widely used synthetic benchmarks, surprisingly shallow techniques such as personalized PageRank have the best performance for a minimum computation time. But when the graph structure is more complex and nodes have meaningful features, gradient-based methods are the best according to our evaluation criteria. However, none dominates the others on all evaluation dimensions and there is always a trade-off. We further apply our evaluation protocol in a case study for frauds explanation on eBay transaction graphs to reflect the production environment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题