对实体一致性的最先进的批判性评估

论文标题

对实体一致性的最先进的批判性评估

A Critical Assessment of State-of-the-Art in Entity Alignment

论文作者

Berrendorf, Max, Wacker, Ludwig, Faerman, Evgeniy

论文摘要

在这项工作中，我们对知识图中实体对齐任务进行了两种最先进的方法（SOTA）进行了广泛的研究。因此，我们首先仔细检查了基准测试过程并确定了几个缺点，这使得原始作品中报告的结果并非总是可比的。此外，我们怀疑在社区中，直接在测试集上进行超参数优化是一种普遍的做法，从而降低了报告的性能的信息价值。因此，我们选择了基准数据集的代表性样本并描述其属性。我们还检查了实体表示的不同初始化，因为它们是模型性能的决定性因素。此外，我们使用共享的火车/验证/测试拆分进行公平评估设置，在该设置中，我们在所有数据集中评估了所有方法。在我们的评估中，我们做出了一些有趣的发现。虽然我们观察到大多数时间SOTA方法的表现都比基线更好，但是当数据集包含噪声时，它们会遇到困难，而在大多数真实生活中，这种情况就是这种情况。此外，我们在消融研究中发现，SOTA方法通常不同的特征对于良好的性能至关重要。该代码可在https://github.com/mberr/ea-sota-comparparison上找到。

In this work, we perform an extensive investigation of two state-of-the-art (SotA) methods for the task of Entity Alignment in Knowledge Graphs. Therefore, we first carefully examine the benchmarking process and identify several shortcomings, which make the results reported in the original works not always comparable. Furthermore, we suspect that it is a common practice in the community to make the hyperparameter optimization directly on a test set, reducing the informative value of reported performance. Thus, we select a representative sample of benchmarking datasets and describe their properties. We also examine different initializations for entity representations since they are a decisive factor for model performance. Furthermore, we use a shared train/validation/test split for a fair evaluation setting in which we evaluate all methods on all datasets. In our evaluation, we make several interesting findings. While we observe that most of the time SotA approaches perform better than baselines, they have difficulties when the dataset contains noise, which is the case in most real-life applications. Moreover, we find out in our ablation study that often different features of SotA methods are crucial for good performance than previously assumed. The code is available at https://github.com/mberr/ea-sota-comparison.

下载PDF全文

下载文献需遵守相关版权规定

论文标题