使用Wikidata实体链接的实体配置文件生成改进候选人检索

论文标题

使用Wikidata实体链接的实体配置文件生成改进候选人检索

Improving Candidate Retrieval with Entity Profile Generation for Wikidata Entity Linking

论文作者

Lai, Tuan Manh, Ji, Heng, Zhai, ChengXiang

论文摘要

实体链接（EL）是将文档中的实体提及与知识库（KB）中的参考实体提及的任务。许多先前的研究都集中在维基百科衍生的KBS上。尽管它是最广泛的众包KB，但在Wikidata上的EL工作很少。 Wikidata的规模可以打开许多新的现实应用程序，但其大量实体也使EL具有挑战性。为了有效地缩小搜索空间，我们提出了一个基于实体分析的新型候选范式。 Wikidata实体及其文本字段首先被索引到文本搜索引擎（例如Elasticsearch）。在推断期间，给出提及及其上下文，我们使用序列到序列（SEQ2SEQ）模型来生成目标实体的配置文件，该模型由其标题和描述组成。我们使用配置文件查询索引搜索引擎以检索候选实体。我们的方法补充了使用Wikipedia Anchor-Text词典的传统方法，从而使我们能够进一步设计一种高效的候选方法来检索。结合一个简单的跨意义重读者，我们完整的EL Framework在三个基于Wikidata的数据集中获得了最先进的结果，并在ACKBP-2010上的出色表现达到了强劲的性能。

Entity linking (EL) is the task of linking entity mentions in a document to referent entities in a knowledge base (KB). Many previous studies focus on Wikipedia-derived KBs. There is little work on EL over Wikidata, even though it is the most extensive crowdsourced KB. The scale of Wikidata can open up many new real-world applications, but its massive number of entities also makes EL challenging. To effectively narrow down the search space, we propose a novel candidate retrieval paradigm based on entity profiling. Wikidata entities and their textual fields are first indexed into a text search engine (e.g., Elasticsearch). During inference, given a mention and its context, we use a sequence-to-sequence (seq2seq) model to generate the profile of the target entity, which consists of its title and description. We use the profile to query the indexed search engine to retrieve candidate entities. Our approach complements the traditional approach of using a Wikipedia anchor-text dictionary, enabling us to further design a highly effective hybrid method for candidate retrieval. Combined with a simple cross-attention reranker, our complete EL framework achieves state-of-the-art results on three Wikidata-based datasets and strong performance on TACKBP-2010.

下载PDF全文

下载文献需遵守相关版权规定

论文标题