论文标题

Tabi:开放域实体检索的类型感知双重编码器

TABi: Type-Aware Bi-Encoders for Open-Domain Entity Retrieval

论文作者

Leszczynski, Megan, Fu, Daniel Y., Chen, Mayee F., Ré, Christopher

论文摘要

实体检索 - 取消有关实体的信息在查询中提到的信息是开放域任务的关键步骤,例如问题回答或事实检查。但是,由于对流行实体的偏见,最先进的实体检索员难以检索稀有的实体含糊不清。在培训期间合并知识图类型可以帮助克服流行偏见,但是存在几个挑战:(1)现有的基于类型的检索方法需要提及边界作为输入,但开放域的任务在非结构化的文本上进行,(2)基于类型的方法不应损害整体性能,并且(3)基于类型的方法对噪声和缺失类型不应强大。在这项工作中,我们介绍了Tabi,一种方法是在知识图类型上共同培训双重编码器和开放域任务的实体检索的非结构化文本。塔布(Tabi)利用一种类型的对比损失来鼓励嵌入空间中相似类型的实体和查询。 Tabi改善了模棱两可的实体检索(Amber)的稀有实体的检索,同时与最先进的检索员相比,在苏格兰酒基准的开放域任务上保持了强大的总体检索性能。 Tabi对于不完整的类型系统也很健壮,可以改善稀有实体检索,而培训数据集的覆盖范围仅为5%。我们在https://github.com/hazyresearch/tabi上公开提供代码。

Entity retrieval--retrieving information about entity mentions in a query--is a key step in open-domain tasks, such as question answering or fact checking. However, state-of-the-art entity retrievers struggle to retrieve rare entities for ambiguous mentions due to biases towards popular entities. Incorporating knowledge graph types during training could help overcome popularity biases, but there are several challenges: (1) existing type-based retrieval methods require mention boundaries as input, but open-domain tasks run on unstructured text, (2) type-based methods should not compromise overall performance, and (3) type-based methods should be robust to noisy and missing types. In this work, we introduce TABi, a method to jointly train bi-encoders on knowledge graph types and unstructured text for entity retrieval for open-domain tasks. TABi leverages a type-enforced contrastive loss to encourage entities and queries of similar types to be close in the embedding space. TABi improves retrieval of rare entities on the Ambiguous Entity Retrieval (AmbER) sets, while maintaining strong overall retrieval performance on open-domain tasks in the KILT benchmark compared to state-of-the-art retrievers. TABi is also robust to incomplete type systems, improving rare entity retrieval over baselines with only 5% type coverage of the training dataset. We make our code publicly available at https://github.com/HazyResearch/tabi.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源