机架：机架规模计算机的微秒尺度调度程序（技术报告）

论文标题

机架：机架规模计算机的微秒尺度调度程序（技术报告）

RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers (Technical Report)

论文作者

Zhu, Hang, Kaffes, Kostis, Chen, Zixu, Liu, Zhenming, Kozyrakis, Christos, Stoica, Ion, Jin, Xin

论文摘要

低延迟在线服务具有严格的服务水平目标（SLO），需要数据中心系统以支持微秒尺度尾部潜伏期的高吞吐量。 DataPlane操作系统旨在扩大此类SLO的间接费用最小的多核服务器。但是，随着应用程序需求继续增加，扩展是不够的，并且服务需求更大，要求这些系统可以扩展到机架中的多个服务器。我们提出了架子，这是第一个机架级微秒尺度调度程序，它提供了带有机架尺度计算机的抽象（即具有数百至成千上万个内核的巨大服务器），用于具有网络系统共同设计的外部服务。 Racksched的核心是一个两层调度框架，该框架将每个服务器中的服务器间调度（TOR）开关集成在一起，将服务器间调度集成在一起。我们使用分析结果和仿真的组合表明，它可以作为集中调度策略提供近乎最佳的性能，并且对于低分散和高分散工作负载都是可靠的。我们为实现K-Choices的服务器间调度程序设计一个自定义开关数据平面，可确保请求亲和力，并准确有效地跟踪服务器的负载。我们在通过赤脚tofino开关连接的商品服务器集群上实现了一个架子的原型。在十二个服务器测试床上的端到端实验表明，机架的吞吐量提高了高达1.44倍，并在线性上缩放吞吐量，同时保持与一台服务器相同的尾部潜伏期，直到系统饱和为止。

Low-latency online services have strict Service Level Objectives (SLOs) that require datacenter systems to support high throughput at microsecond-scale tail latency. Dataplane operating systems have been designed to scale up multi-core servers with minimal overhead for such SLOs. However, as application demands continue to increase, scaling up is not enough, and serving larger demands requires these systems to scale out to multiple servers in a rack. We present RackSched, the first rack-level microsecond-scale scheduler that provides the abstraction of a rack-scale computer (i.e., a huge server with hundreds to thousands of cores) to an external service with network-system co-design. The core of RackSched is a two-layer scheduling framework that integrates inter-server scheduling in the top-of-rack (ToR) switch with intra-server scheduling in each server. We use a combination of analytical results and simulations to show that it provides near-optimal performance as centralized scheduling policies, and is robust for both low-dispersion and high-dispersion workloads. We design a custom switch data plane for the inter-server scheduler, which realizes power-of-k-choices, ensures request affinity, and tracks server loads accurately and efficiently. We implement a RackSched prototype on a cluster of commodity servers connected by a Barefoot Tofino switch. End-to-end experiments on a twelve-server testbed show that RackSched improves the throughput by up to 1.44x, and scales out the throughput near linearly, while maintaining the same tail latency as one server until the system is saturated.

下载PDF全文

下载文献需遵守相关版权规定

论文标题