论文标题

硬币:二分图的共簇信息

COIN: Co-Cluster Infomax for Bipartite Graphs

论文作者

Jing, Baoyu, Yan, Yuchen, Zhu, Yada, Tong, Hanghang

论文摘要

两分的图是强大的数据结构,可以模拟两种类型的节点之间的相互作用,这些节点已用于多种应用,例如推荐系统,信息检索和药物发现。两分图的一个基本挑战是如何学习信息性的节点嵌入。尽管最近在两部分图上进行了自我监督的学习方法成功,但它们的目标是歧视实例的正面和负节点对,这可能包含群集级别的错误。在本文中,我们介绍了一个新型的共同群集信息(硬币)框架,该框架通过最大化共同群体的相互信息来捕获群集级信息。与以前的Infomax方法不同,这些方法估算了神经网络的共同信息,硬币可以很容易地计算相互信息。此外,硬币是一种端到端共聚类方法,可以与其他目标函数共同训练,并通过反向传播进行优化。此外,我们还为硬币提供了理论分析。从理论上讲,我们证明了硬币能够有效地增加节点嵌入的互信息,而硬币是由先前的节点分布所限制的。我们广泛评估了各种基准数据集和任务上提出的硬币框架,以证明硬币的有效性。

Bipartite graphs are powerful data structures to model interactions between two types of nodes, which have been used in a variety of applications, such as recommender systems, information retrieval, and drug discovery. A fundamental challenge for bipartite graphs is how to learn informative node embeddings. Despite the success of recent self-supervised learning methods on bipartite graphs, their objectives are discriminating instance-wise positive and negative node pairs, which could contain cluster-level errors. In this paper, we introduce a novel co-cluster infomax (COIN) framework, which captures the cluster-level information by maximizing the mutual information of co-clusters. Different from previous infomax methods which estimate mutual information by neural networks, COIN could easily calculate mutual information. Besides, COIN is an end-to-end coclustering method which can be trained jointly with other objective functions and optimized via back-propagation. Furthermore, we also provide theoretical analysis for COIN. We theoretically prove that COIN is able to effectively increase the mutual information of node embeddings and COIN is upper-bounded by the prior distributions of nodes. We extensively evaluate the proposed COIN framework on various benchmark datasets and tasks to demonstrate the effectiveness of COIN.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源