行星规模系统（扩展版）的状态机器复制

论文标题

行星规模系统（扩展版）的状态机器复制

State-Machine Replication for Planet-Scale Systems (Extended Version)

论文作者

Enes, Vitor, Baquero, Carlos, Rezende, Tuanir França, Gotsman, Alexey, Perrin, Matthieu, Sutra, Pierre

论文摘要

现在，在线应用程序可以在世界各地的多个站点上定期复制其数据。在本文中，我们介绍了Atlas，这是针对此类行星规模系统量身定制的第一个状态机器复制协议。 Atlas不依赖杰出的领导者，因此客户独立于其地理位置享受相同的服务质量。此外，随着我们将网站添加到更靠近客户端时，客户感知的延迟会改善。为了实现这一目标，使用观察到并发数据中心失败很少见，将Atlas最大程度地减少其法定人数的大小。即使在这些冲突时，它也会在一次往返中处理很大比例的访问。我们通过实验表明，在行星尺度场景中，地图集始终优于最先进的协议。特别是，地图集的速度比具有相同失败假设的柔性Paxos快两倍，并且在YCSB基准中的平等PAXOS的性能增加了一倍以上。

Online applications now routinely replicate their data at multiple sites around the world. In this paper we present Atlas, the first state-machine replication protocol tailored for such planet-scale systems. Atlas does not rely on a distinguished leader, so clients enjoy the same quality of service independently of their geographical locations. Furthermore, client-perceived latency improves as we add sites closer to clients. To achieve this, Atlas minimizes the size of its quorums using an observation that concurrent data center failures are rare. It also processes a high percentage of accesses in a single round trip, even when these conflict. We experimentally demonstrate that Atlas consistently outperforms state-of-the-art protocols in planet-scale scenarios. In particular, Atlas is up to two times faster than Flexible Paxos with identical failure assumptions, and more than doubles the performance of Egalitarian Paxos in the YCSB benchmark.

下载PDF全文

下载文献需遵守相关版权规定

论文标题