没有“索引”之类的东西！或：下一500份索引论文

论文标题

没有“索引”之类的东西！或：下一500份索引论文

There is No Such Thing as an "Index"! or: The next 500 Indexing Papers

论文作者

Dittrich, Jens, Nix, Joris, Schön, Christian

论文摘要

索引结构是查询处理和计算机科学的基础。自计算机技术的曙光以来，就已经建立了索引结构。从那时起，每年都会发明和出版无数的索引结构。在本文中，我们认为“发明索引”的想法首先是一个误导的概念。这是“发明物理查询计划”的类似物。本文是一种范式转变，我们建议将这个想法放在手工索引结构（如B-Trees上，以任何形式的学习索引）将其放在手工艺索引结构上。我们提出了一种新的自动指数育种框架创造的遗传通用索引结构（基因）。基于这样的观察，几乎所有索引结构都沿三个主要维度组装在一起：（1）结构构建块，例如，从两种不同的结构节点类型（内部和叶子节点）（2）B-tree的b-tree组合，例如，B-tree所有路径都具有相同的篇幅和（3），（3）是相同的长度和（3）划分的（3）划分（3），（3） ETC。）。我们提出了一个通用索引框架，可以模仿沿这些维度的许多现有索引结构。基于该框架，我们提出了一种通用的遗传指数产生算法，鉴于工作量和优化目标，可以自动组装和突变，换句话说，“新索引结构”。在我们的实验中，我们遵循多个目标。我们从数据库技术中重新审查了一些良好的旧智慧。鉴于特定的工作量，基因甚至会繁殖一个指数，等同于我们的教科书和论文目前建议这样的工作量？还是我们可以做更多的事情？我们的初始结果强烈表明生成的索引是设计索引结构的下一步。

Index structures are a building block of query processing and computer science in general. Since the dawn of computer technology there have been index structures. And since then, a myriad of index structures are being invented and published each and every year. In this paper we argue that the very idea of "inventing an index" is a misleading concept in the first place. It is the analogue of "inventing a physical query plan". This paper is a paradigm shift in which we propose to drop the idea to handcraft index structures (as done for binary search trees over B-trees to any form of learned index) altogether. We present a new automatic index breeding framework coined Genetic Generic Generation of Index Structures (GENE). It is based on the observation that almost all index structures are assembled along three principal dimensions: (1) structural building blocks, e.g., a B-tree is assembled from two different structural node types (inner and leaf nodes), (2) a couple of invariants, e.g., for a B-tree all paths have the same length, and (3) decisions on the internal layout of nodes (row or column layout, etc.). We propose a generic indexing framework that can mimic many existing index structures along those dimensions. Based on that framework we propose a generic genetic index generation algorithm that, given a workload and an optimization goal, can automatically assemble and mutate, in other words 'breed' new index structure 'species'. In our experiments we follow multiple goals. We reexamine some good old wisdom from database technology. Given a specific workload, will GENE even breed an index that is equivalent to what our textbooks and papers currently recommend for such a workload? Or can we do even more? Our initial results strongly indicate that generated indexes are the next step in designing index structures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题