论文标题
图神经网络回归中的残留相关性
Residual Correlation in Graph Neural Network Regression
论文作者
论文摘要
图神经网络将每个顶点邻域中的特征转换为顶点的向量表示。之后,每个顶点的表示形式都独立用于预测其标签。该标准管道隐含地假设顶点标签在有条件地独立于鉴于其邻居特征。但是,这是一个有力的假设,我们表明它在许多现实世界图数据集中远非如此。专注于回归任务,我们发现这种条件独立性假设严重限制了预测能力。鉴于传统的基于图形的半监督学习方法(例如标签传播工作)以相反的方式通过明确建模预测结果的相关性,这并不令人惊讶。 在这里,我们使用可解释和高效的框架来解决这个问题,该框架可以通过利用回归残差中的相关结构来改善任何图形神经网络体系结构。特别是,我们用参数化的多元高斯在顶点上残留物的联合分布进行建模,并通过最大化观察到的标签的边际可能性来估算参数。我们的框架比竞争基线的框架的准确性要高得多,并且可以将学习的参数解释为连接的顶点之间相关的强度。此外,我们开发了用于低变化,无偏模型参数估计的线性时间算法,从而使我们可以扩展到大型网络。我们还提供了我们方法的基本版本,该版本对相关结构做出了更强的假设,但实施却没有痛苦,通常会导致高空开销的实践表现。
A graph neural network transforms features in each vertex's neighborhood into a vector representation of the vertex. Afterward, each vertex's representation is used independently for predicting its label. This standard pipeline implicitly assumes that vertex labels are conditionally independent given their neighborhood features. However, this is a strong assumption, and we show that it is far from true on many real-world graph datasets. Focusing on regression tasks, we find that this conditional independence assumption severely limits predictive power. This should not be that surprising, given that traditional graph-based semi-supervised learning methods such as label propagation work in the opposite fashion by explicitly modeling the correlation in predicted outcomes. Here, we address this problem with an interpretable and efficient framework that can improve any graph neural network architecture simply by exploiting correlation structure in the regression residuals. In particular, we model the joint distribution of residuals on vertices with a parameterized multivariate Gaussian, and estimate the parameters by maximizing the marginal likelihood of the observed labels. Our framework achieves substantially higher accuracy than competing baselines, and the learned parameters can be interpreted as the strength of correlation among connected vertices. Furthermore, we develop linear time algorithms for low-variance, unbiased model parameter estimates, allowing us to scale to large networks. We also provide a basic version of our method that makes stronger assumptions on correlation structure but is painless to implement, often leading to great practical performance with minimal overhead.