论文标题

处理科学数据的知识图:挑战和前景

Knowledge Graphs for Processing Scientific Data: Challenges and Prospects

论文作者

Salehpour, Masoud, Davis, Joseph G.

论文摘要

对使用知识图(kgs)来表示,交换和再利用科学数据的兴趣越来越大。尽管KGS提供了改善基础架构的前景,以使用可扩展和可重复使用的学术数据与公平性(可发现性,可访问性,互操作性和可重用性)原则一致,但最先进的数据管理系统(DMS)用于处理大型KGS,请留出一定程度的问题。在本文中,我们研究了一些主要DMS的性能,以查询KGS,目的是对代表四种主要DMS类型中每一种的DMS进行细粒度的比较分析。我们尝试了四个著名的科学kgs,即艾莉,细胞周期,药品库,以及针对Virtuoso,BlazeGraph,RDF-3X和MongoDB作为代表性DMS的SlineSPL。我们的结果表明,DMS在KG数据集上处理复杂查询时显示出限制。根据查询类型,性能差异可以是几个数量级。另外,似乎没有单个DMS能够提供一贯的出色性能。我们对基本问题进行了分析,并概述了解决问题的两种综合方法和建议。

There is growing interest in the use of Knowledge Graphs (KGs) for the representation, exchange, and reuse of scientific data. While KGs offer the prospect of improving the infrastructure for working with scalable and reusable scholarly data consistent with the FAIR (Findability, Accessibility, Interoperability, and Reusability) principles, the state-of-the-art Data Management Systems (DMSs) for processing large KGs leave somewhat to be desired. In this paper, we studied the performance of some of the major DMSs in the context of querying KGs with the goal of providing a finely-grained, comparative analysis of DMSs representing each of the four major DMS types. We experimented with four well-known scientific KGs, namely, Allie, Cellcycle, DrugBank, and LinkedSPL against Virtuoso, Blazegraph, RDF-3X, and MongoDB as the representative DMSs. Our results suggest that the DMSs display limitations in processing complex queries on the KG datasets. Depending on the query type, the performance differentials can be several orders of magnitude. Also, no single DMS appears to offer consistently superior performance. We present an analysis of the underlying issues and outline two integrated approaches and proposals for resolving the problem.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源