论文标题
要管道或不管道,这是一个问题
To pipeline or not to pipeline, that is the question
论文作者
论文摘要
在设计查询处理原始图中,关键的设计选择是查询计划中两个操作员之间数据传输的方法。当我们考虑为正在建立的内存数据库系统的关键设计机制时,我们很快意识到(令人惊讶的是)对此概念没有明确的定义。论文是充分的或临时使用的术语,例如管道和阻塞,但是由于这些术语不是清晰的定义,因此很难完全理解这些概念所归因的结果。为了解决此限制,我们介绍了一个明确的术语,以了解如何在查询管道中考虑操作员之间的数据传输。我们表明,没有明确的管道和阻塞定义,并且基于一个称为转移单位单位的简单概念,有一系列技术。接下来,我们为操作员通信开发一个分析模型,并突出显示影响性能的关键参数(用于内存数据库设置)。然后,我们将其应用于我们正在设计的系统,并突出我们从本练习中收集的见解。我们发现,管道和非上行查询执行之间的差距,W.R.T。诸如性能和内存足迹之类的关键因素非常狭窄,因此系统设计人员可能应该重新考虑管道内的数据库系统系统的概念。
In designing query processing primitives, a crucial design choice is the method for data transfer between two operators in a query plan. As we were considering this critical design mechanism for an in-memory database system that we are building, we quickly realized that (surprisingly) there isn't a clear definition of this concept. Papers are full or ad hoc use of terms like pipelining and blocking, but as these terms are not crisply defined, it is hard to fully understand the results attributed to these concepts. To address this limitation, we introduce a clear terminology for how to think about data transfer between operators in a query pipeline. We show that there isn't a clear definition of pipelining and blocking, and that there is a full spectrum of techniques based on a simple concept called unit-of-transfer. Next, we develop an analytical model for inter-operator communication, and highlight the key parameters that impact performance (for in-memory database settings). Armed with this model, we then apply it to the system we are designing and highlight the insights we gathered from this exercise. We find that the gap between pipelining and non-pipelining query execution, w.r.t. key factors such as performance and memory footprint is quite narrow, and thus system designers should likely rethink the notion of pipelining vs. blocking for in-memory database systems.