标记为文本网络的集成节点编码器

论文标题

标记为文本网络的集成节点编码器

Integrated Node Encoder for Labelled Textual Networks

论文作者

Ma, Ye, Zong, Lu

论文摘要

已经实施了大量的作品来利用内容增强的网络嵌入模型，几乎没有关注标记的节点信息。尽管TRIDNR通过将节点标记视为节点属性来利用它们，但它无法用标记的信息富集未标记的节点向量，这与现有的未耐药文本网络嵌入模型相比，这会导致测试集的分类较弱。在这项研究中，我们为文本网络设计了一个集成的节点编码器（INE），该网络网络由基于结构和标签的目标共同训练。结果，节点编码器不仅保留了网络文本和结构的集成知识，还保留了标记的信息。此外，INE允许通过输入其节点内容来创建未标记节点的标签增强矢量。我们的节点嵌入在两个公共引用网络上的分类任务中实现了最先进的性能，即Cora和DBLP，分别以70 \％的培训率将基准提高了10.0 \％和12.1 \％。此外，提出了一个可行的解决方案，该解决方案将我们的模型从文本网络推广到更广泛的网络。

Voluminous works have been implemented to exploit content-enhanced network embedding models, with little focus on the labelled information of nodes. Although TriDNR leverages node labels by treating them as node attributes, it fails to enrich unlabelled node vectors with the labelled information, which leads to the weaker classification result on the test set in comparison to existing unsupervised textual network embedding models. In this study, we design an integrated node encoder (INE) for textual networks which is jointly trained on the structure-based and label-based objectives. As a result, the node encoder preserves the integrated knowledge of not only the network text and structure, but also the labelled information. Furthermore, INE allows the creation of label-enhanced vectors for unlabelled nodes by entering their node contents. Our node embedding achieves state-of-the-art performances in the classification task on two public citation networks, namely Cora and DBLP, pushing benchmarks up by 10.0\% and 12.1\%, respectively, with the 70\% training ratio. Additionally, a feasible solution that generalizes our model from textual networks to a broader range of networks is proposed.

下载PDF全文

下载文献需遵守相关版权规定

论文标题