论文标题

Edin:未知实体发现和索引的端到端基准和管道

EDIN: An End-to-end Benchmark and Pipeline for Unknown Entity Discovery and Indexing

论文作者

Kassner, Nora, Petroni, Fabio, Plekhanov, Mikhail, Riedel, Sebastian, Cancedda, Nicola

论文摘要

链接实体的现有工作主要假定参考知识库已经完成,因此所有提及都可以链接。在实践中,情况并非如此,因为知识库是不完整的,并且因为不断出现新颖的概念。本文创建了未知的实体发现和索引(EDIN)基准,其中未知实体(即知识库中没有描述的实体和标签提及)必须集成到现有的实体链接系统中。通过将Edin与零射击实体的链接进行对比,我们可以洞悉其提出的其他挑战。我们介绍了基于密集的进行基质的实体链接,我们介绍了端到端的Edin管道,该管道检测到上下文中未知实体的提及。实验表明,索引一个嵌入每个实体统一多个提及信息的效果比独立索引提及更好。

Existing work on Entity Linking mostly assumes that the reference knowledge base is complete, and therefore all mentions can be linked. In practice this is hardly ever the case, as knowledge bases are incomplete and because novel concepts arise constantly. This paper created the Unknown Entity Discovery and Indexing (EDIN) benchmark where unknown entities, that is entities without a description in the knowledge base and labeled mentions, have to be integrated into an existing entity linking system. By contrasting EDIN with zero-shot entity linking, we provide insight on the additional challenges it poses. Building on dense-retrieval based entity linking, we introduce the end-to-end EDIN pipeline that detects, clusters, and indexes mentions of unknown entities in context. Experiments show that indexing a single embedding per entity unifying the information of multiple mentions works better than indexing mentions independently.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源