论文标题

基于圆形后缀排序的空间/时间效率的RDF商店

Space/time-efficient RDF stores based on circular suffix sorting

论文作者

Brisaboa, Nieves R., Cerdeira-Pena, Ana, de Bernardo, Guillermo, Fariña, Antonio, Navarro, Gonzalo

论文摘要

近年来,RDF已成为标准化出版物和数据网络交换的格式。在本文中,我们介绍了RDFCSA,该数据结构能够在较小的空间中自我索引,并支持有效的查询。 RDFCSA将RDF存储的三元组视为短圆字符串,并在这些字符串上应用后缀排序,因此将三个图案查询减少到字符串集上的前缀搜索。然后,使用压缩后缀阵列(CSA)将RDF存储保持紧凑,这是一种经过证明的文本索引技术,该技术有效地支持前缀搜索。 我们的实验表明,RDFCSA提供了紧凑的RDF表示,使用原始数据所需的少于60%的空间,并且在回答三平均查询(每个结果几微秒)时会产生快速,一致的查询时间。我们还支持加入查询,这是大多数SPARQL查询的关键组成部分。显示RDFCSA可以提供出色的空间/时间折衷,通常使用比及时竞争的替代方案要少得多的空间。

In recent years, RDF has gained popularity as a format for the standardized publication and exchange of information in the Web of Data. In this paper we introduce RDFCSA, a data structure that is able to self-index an RDF dataset in small space and supports efficient querying. RDFCSA regards the triples of the RDF store as short circular strings and applies suffix sorting on those strings, so that triple-pattern queries reduce to prefix searching on the string set. The RDF store is then represented compactly using a Compressed Suffix Array (CSA), a proved technology in text indexing that efficiently supports prefix searches. Our experiments show that RDFCSA provides a compact RDF representation, using less than 60% of the space required by the raw data, and yields fast and consistent query times when answering triple-pattern queries (a few microseconds per result). We also support join queries, a key component of most SPARQL queries. RDFCSA is shown to provide an excellent space/time tradeoff, typically using much less space than alternatives that compete in time.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源