论文标题
低资源跨语言实体链接中的设计挑战
Design Challenges in Low-resource Cross-lingual Entity Linking
论文作者
论文摘要
跨语性实体联系(XEL)是将外语文本中的实体提及到Wikipedia等英语知识基础的问题,近年来已经进行了大量研究,并采用了一系列有希望的技术。但是,当前的技术并没有引起文本中低资源语言(LRL)引入的挑战,而且令人惊讶的是,无法概括从通常受过训练的Wikipedia中获得的文本。 本文对低资源XEL技术进行了彻底的分析,重点是识别与给定外语提及相对应的候选人英语Wikipedia标题的关键步骤。我们的分析表明,当前方法受到对Wikipedia的中间语言链接的依赖的限制,因此当外语的Wikipedia很小时会受到影响。我们得出的结论是,LRL设置需要使用外部的跨语性资源,并提出一个简单而有效的零射击XEL系统Quel,该系统利用搜索引擎查询日志。通过对25种语言进行实验,Quel〜显示出金候选人召回的平均增加25 \%,而端到端的链接准确性比最先进的基线相比为13 \%。
Cross-lingual Entity Linking (XEL), the problem of grounding mentions of entities in a foreign language text into an English knowledge base such as Wikipedia, has seen a lot of research in recent years, with a range of promising techniques. However, current techniques do not rise to the challenges introduced by text in low-resource languages (LRL) and, surprisingly, fail to generalize to text not taken from Wikipedia, on which they are usually trained. This paper provides a thorough analysis of low-resource XEL techniques, focusing on the key step of identifying candidate English Wikipedia titles that correspond to a given foreign language mention. Our analysis indicates that current methods are limited by their reliance on Wikipedia's interlanguage links and thus suffer when the foreign language's Wikipedia is small. We conclude that the LRL setting requires the use of outside-Wikipedia cross-lingual resources and present a simple yet effective zero-shot XEL system, QuEL, that utilizes search engines query logs. With experiments on 25 languages, QuEL~shows an average increase of 25\% in gold candidate recall and of 13\% in end-to-end linking accuracy over state-of-the-art baselines.