论文标题
关于歧义作者:以自下而上的方式进行协作网络重建
On Disambiguating Authors: Collaboration Network Reconstruction in a Bottom-up Manner
论文作者
论文摘要
当不同的作者共享同名时,作者就会产生歧义,这是数字库中的至关重要的任务,例如DBLP,Citeulike,Citeseerx等。尽管最先进的方法开发了以自上而下的方式执行的各种基于纸张的方法,但它们主要集中在与低相关性中的目标名称和忽视相关性的ego-Network上,它们在erlook new new中的关系不存在。因此,这些方法对于歧义作者可能是次优的。 在本文中,我们将作者放弃歧义为协作网络重建问题,并提出了一种增量且无监督的作者消除歧义方法,即以自下而上的方式执行的IUAD。最初,我们基于稳定的协作关系建立一个稳定的协作网络。为了进一步改善召回,我们建立了一个概率生成模型,以重建完整的协作网络。此外,对于新发表的论文,我们可以逐步判断谁仅通过计算后验概率来发布它们。我们已经在大规模DBLP数据集上进行了广泛的实验,以评估IUAD。实验结果表明,IUAD不仅实现了有希望的性能,而且表现出色的基准显着。代码可在https://github.com/prapegitgit/iuad上找到。
Author disambiguation arises when different authors share the same name, which is a critical task in digital libraries, such as DBLP, CiteULike, CiteSeerX, etc. While the state-of-the-art methods have developed various paper embedding-based methods performing in a top-down manner, they primarily focus on the ego-network of a target name and overlook the low-quality collaborative relations existed in the ego-network. Thus, these methods can be suboptimal for disambiguating authors. In this paper, we model the author disambiguation as a collaboration network reconstruction problem, and propose an incremental and unsupervised author disambiguation method, namely IUAD, which performs in a bottom-up manner. Initially, we build a stable collaboration network based on stable collaborative relations. To further improve the recall, we build a probabilistic generative model to reconstruct the complete collaboration network. In addition, for newly published papers, we can incrementally judge who publish them via only computing the posterior probabilities. We have conducted extensive experiments on a large-scale DBLP dataset to evaluate IUAD. The experimental results demonstrate that IUAD not only achieves the promising performance, but also outperforms comparable baselines significantly. Codes are available at https://github.com/papergitgit/IUAD.