FreeKD：图形神经网络的自由方向知识蒸馏

论文标题

FreeKD：图形神经网络的自由方向知识蒸馏

FreeKD: Free-direction Knowledge Distillation for Graph Neural Networks

论文作者

Feng, Kaituo, Li, Changsheng, Yuan, Ye, Wang, Guoren

论文摘要

知识蒸馏（KD）证明了其有效性，可以提高图形神经网络（GNNS）的性能，在该表现中，其目标是将知识从更深的教师GNN提炼为较浅的学生GNN。但是，由于众所周知的过度参数和过度光滑的问题，实际上很难培训令人满意的教师GNN，从而导致实际应用中的知识转移无效。在本文中，我们通过对GNN的加强学习（称为FreeKD）提出了第一个自由方向知识蒸馏框架，而这不再需要提供更深入的良好优化的教师GNN。我们工作的核心思想是协作建立两个较浅的GNN，以通过层次结构方式通过加强学习来交流知识。当我们观察到训练期间，一个典型的GNN模型在不同节点的表现通常更好，更差，我们设计了一种动态和自由方向的知识转移策略，该策略由两个级别的动作组成：1）节点级别的动作决定了两个网络相应节点之间知识转移的方向。然后2）结构级动作确定了要传播的节点级别生成的局部结构。从本质上讲，我们的FreeKD是一个一般且有原则的框架，可以自然与不同体系结构的GNN兼容。在五个基准数据集上进行的广泛实验表明，我们的FreeKD在很大的边距上优于两个基本GNN，并显示了其对各种GNN的功效。更令人惊讶的是，我们的FreeKD比传统的KD算法具有可比性甚至更好的性能，这些KD算法将知识从更深，更强大的教师GNN中提取。

Knowledge distillation (KD) has demonstrated its effectiveness to boost the performance of graph neural networks (GNNs), where its goal is to distill knowledge from a deeper teacher GNN into a shallower student GNN. However, it is actually difficult to train a satisfactory teacher GNN due to the well-known over-parametrized and over-smoothing issues, leading to invalid knowledge transfer in practical applications. In this paper, we propose the first Free-direction Knowledge Distillation framework via Reinforcement learning for GNNs, called FreeKD, which is no longer required to provide a deeper well-optimized teacher GNN. The core idea of our work is to collaboratively build two shallower GNNs in an effort to exchange knowledge between them via reinforcement learning in a hierarchical way. As we observe that one typical GNN model often has better and worse performances at different nodes during training, we devise a dynamic and free-direction knowledge transfer strategy that consists of two levels of actions: 1) node-level action determines the directions of knowledge transfer between the corresponding nodes of two networks; and then 2) structure-level action determines which of the local structures generated by the node-level actions to be propagated. In essence, our FreeKD is a general and principled framework which can be naturally compatible with GNNs of different architectures. Extensive experiments on five benchmark datasets demonstrate our FreeKD outperforms two base GNNs in a large margin, and shows its efficacy to various GNNs. More surprisingly, our FreeKD has comparable or even better performance than traditional KD algorithms that distill knowledge from a deeper and stronger teacher GNN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题