通过偏置值编码进行稳健有效的排序

论文标题

通过偏置值编码进行稳健有效的排序

Robust and Efficient Sorting with Offset-Value Coding

论文作者

Do, Thanh, Graefe, Goetz

论文摘要

排序和搜索是数据库查询处理的大部分，例如，以索引创建，索引维护和索引查找的形式；比较钥匙对是分类和搜索努力的重要组成部分。我们曾在数十年，被忽视的，有效的技术进行快速比较和快速分类（尤其是偏移值编码）方面进行了简单，有效的实施。在此过程中，我们发生了它与运行文件中前缀截断的互惠关系，以及在行和列形式存储结构中的压缩技术的二元性，即前缀截断和领先密钥列的运行长度编码。我们还发现了与分类流的消费者（例如合并平行流，流入集合和合并加入）的有益关系。我们在Google的NAPA和F1查询系统以及对性能和可扩展性的实验评估中报告了我们的实施。

Sorting and searching are large parts of database query processing, e.g., in the forms of index creation, index maintenance, and index lookup; and comparing pairs of keys is a substantial part of the effort in sorting and searching. We have worked on simple, efficient implementations of decades-old, neglected, effective techniques for fast comparisons and fast sorting, in particular offset-value coding. In the process, we happened upon its mutually beneficial relationship with prefix truncation in run files as well as the duality of compression techniques in row- and column-format storage structures, namely prefix truncation and run-length encoding of leading key columns. We also found a beneficial relationship with consumers of sorted streams, e.g., merging parallel streams, in-stream aggregation, and merge join. We report on our implementation in the context of Google's Napa and F1 Query systems as well as an experimental evaluation of performance and scalability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题