论文标题

QALD-9以上:一个多语言数据集,用于回答DBPEDIA和WIKIDATA的问题,由母语人士翻译

QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers

论文作者

Perevalov, Aleksandr, Diefenbach, Dennis, Usbeck, Ricardo, Both, Andreas

论文摘要

具有相同体验的不同用户组(即可访问性)的能力是基于Web的系统的最重要特征之一。知识图答案(KGQA)系统也是如此,通过自然语言界面为语义Web数据提供访问。在遵循有关KGQA系统可访问性的多语言方面的研究议程时,我们确定了一些持续的挑战。其中之一是缺乏多语言KGQA基准。在这项工作中,我们将最受欢迎的KGQA基准之一扩展到了QALD-9,通过将高质量问题的翻译引入以母语为母语者提供的8种语言,并将Qald-9的SPARQL查询从DBPEDIA转移到S.T. Wikidata,S.T。,数据集的可用性和相关性增长了。据我们所知,五种语言 - 亚美尼亚人,乌克兰,立陶宛语,巴什基尔和白俄罗斯语 - 从未在克格卡研究社区中考虑过。后两种语言被联合国教科文组织视为“濒危”。我们调用扩展数据集Qald-9-plus,并在线提供https://github.com/perevalov/qald_9_p​​lus。

The ability to have the same experience for different user groups (i.e., accessibility) is one of the most important characteristics of Web-based systems. The same is true for Knowledge Graph Question Answering (KGQA) systems that provide the access to Semantic Web data via natural language interface. While following our research agenda on the multilingual aspect of accessibility of KGQA systems, we identified several ongoing challenges. One of them is the lack of multilingual KGQA benchmarks. In this work, we extend one of the most popular KGQA benchmarks - QALD-9 by introducing high-quality questions' translations to 8 languages provided by native speakers, and transferring the SPARQL queries of QALD-9 from DBpedia to Wikidata, s.t., the usability and relevance of the dataset is strongly increased. Five of the languages - Armenian, Ukrainian, Lithuanian, Bashkir and Belarusian - to our best knowledge were never considered in KGQA research community before. The latter two of the languages are considered as "endangered" by UNESCO. We call the extended dataset QALD-9-plus and made it available online https://github.com/Perevalov/qald_9_plus.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源