论文标题

探索基于步行的Minibatch选择政策对知识图完成的影响

Exploring Effects of Random Walk Based Minibatch Selection Policy on Knowledge Graph Completion

论文作者

Santra, Bishal, Sharma, Prakhar, Roychowdhury, Sumegh, Goyal, Pawan

论文摘要

在本文中,我们探讨了知识图完成中不同Minibatch抽样技术的影响。知识图完成(KGC)或链接预测是在知识图中预测缺失事实的任务。 KGC模型通常是使用边缘,软质量或横向渗透损失函数训练的,该函数促进了真正的事实三重态分配更高的分数或概率。 Minibatch梯度下降用于优化训练KGC模型的这些损失功能。但是,由于每个Minibatch仅由大型知识图中的几个随机采样三联体组成,因此在大多数情况下,任何发生在Minibatch中的实体都只会发生一次。因此,这些损失功能忽略了任何实体的所有其他邻居,该实体的嵌入在某个Minibatch步骤中正在更新。在本文中,我们为训练KGC模型提出了一种新的基于随机步行的Minibatch采样技术,以优化由三胞胎紧密连接的亚格尔图所产生的损失而不是随机选择的损失。我们已经通过采样技术显示了针对不同模型和数据集的实验结果,发现所提出的采样算法对这些数据集/模型具有不同的影响。具体来说,我们发现我们提出的方法在DB100K数据集上实现了最先进的性能。

In this paper, we have explored the effects of different minibatch sampling techniques in Knowledge Graph Completion. Knowledge Graph Completion (KGC) or Link Prediction is the task of predicting missing facts in a knowledge graph. KGC models are usually trained using margin, soft-margin or cross-entropy loss function that promotes assigning a higher score or probability for true fact triplets. Minibatch gradient descent is used to optimize these loss functions for training the KGC models. But, as each minibatch consists of only a few randomly sampled triplets from a large knowledge graph, any entity that occurs in a minibatch, occurs only once in most cases. Because of this, these loss functions ignore all other neighbors of any entity, whose embedding is being updated at some minibatch step. In this paper, we propose a new random-walk based minibatch sampling technique for training KGC models that optimizes the loss incurred by a minibatch of closely connected subgraph of triplets instead of randomly selected ones. We have shown results of experiments for different models and datasets with our sampling technique and found that the proposed sampling algorithm has varying effects on these datasets/models. Specifically, we find that our proposed method achieves state-of-the-art performance on the DB100K dataset.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源