硬币：二分图的共簇信息

论文标题

硬币：二分图的共簇信息

COIN: Co-Cluster Infomax for Bipartite Graphs

论文作者

Jing, Baoyu, Yan, Yuchen, Zhu, Yada, Tong, Hanghang

论文摘要

两分的图是强大的数据结构，可以模拟两种类型的节点之间的相互作用，这些节点已用于多种应用，例如推荐系统，信息检索和药物发现。两分图的一个基本挑战是如何学习信息性的节点嵌入。尽管最近在两部分图上进行了自我监督的学习方法成功，但它们的目标是歧视实例的正面和负节点对，这可能包含群集级别的错误。在本文中，我们介绍了一个新型的共同群集信息（硬币）框架，该框架通过最大化共同群体的相互信息来捕获群集级信息。与以前的Infomax方法不同，这些方法估算了神经网络的共同信息，硬币可以很容易地计算相互信息。此外，硬币是一种端到端共聚类方法，可以与其他目标函数共同训练，并通过反向传播进行优化。此外，我们还为硬币提供了理论分析。从理论上讲，我们证明了硬币能够有效地增加节点嵌入的互信息，而硬币是由先前的节点分布所限制的。我们广泛评估了各种基准数据集和任务上提出的硬币框架，以证明硬币的有效性。

Bipartite graphs are powerful data structures to model interactions between two types of nodes, which have been used in a variety of applications, such as recommender systems, information retrieval, and drug discovery. A fundamental challenge for bipartite graphs is how to learn informative node embeddings. Despite the success of recent self-supervised learning methods on bipartite graphs, their objectives are discriminating instance-wise positive and negative node pairs, which could contain cluster-level errors. In this paper, we introduce a novel co-cluster infomax (COIN) framework, which captures the cluster-level information by maximizing the mutual information of co-clusters. Different from previous infomax methods which estimate mutual information by neural networks, COIN could easily calculate mutual information. Besides, COIN is an end-to-end coclustering method which can be trained jointly with other objective functions and optimized via back-propagation. Furthermore, we also provide theoretical analysis for COIN. We theoretically prove that COIN is able to effectively increase the mutual information of node embeddings and COIN is upper-bounded by the prior distributions of nodes. We extensively evaluate the proposed COIN framework on various benchmark datasets and tasks to demonstrate the effectiveness of COIN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题