论文标题

确定性数据分布,用于在擦除编码的存储系统中有效恢复

Deterministic Data Distribution for Efficient Recovery in Erasure-Coded Storage Systems

论文作者

Xu, Liangliang, Lyu, Min, Li, Zhipeng, Li, Yongkun, Xu, Yinlong

论文摘要

由于单个不可靠的商品组件,在大规模分布式存储系统中,故障很常见。擦除代码被广泛部署在实用的存储系统中,以提供低存储空间的容错。但是,在擦除编码的存储系统中常用的随机数据分布(RDD)会引起重型交叉流量,负载失衡和随机访问,这会对故障恢复产生不利影响。在本文中,使用正交阵列,我们定义了确定性的数据分布($ d^3 $),以在节点之间均匀地分配数据/奇偶校验块,并根据$ d^3 $提出了一种有效的故障恢复方法,该方法最小化了与单个节点故障相对于单个节点失败的交叉型修理流量。得益于$ d^3 $的均匀性,拟议的恢复方法不仅可以在机架内的节点之间而且在架子中平衡维修流量。我们在Hadoop分布式文件系统(HDFS)中,通过Reed-Solomon代码和本地维修代码实现了$ D^3 $,并用28台机器群集。与RDD相比,我们的实验表明,RS代码的$ D^3 $可显着加快失败恢复的2.49次,而LRCS的失败恢复速度为1.38次。此外,$ d^3 $在正常状态和恢复状态下,$ d^3 $比RDD更好地支持前端应用程序。

Due to individual unreliable commodity components, failures are common in large-scale distributed storage systems. Erasure codes are widely deployed in practical storage systems to provide fault tolerance with low storage overhead. However, random data distribution (RDD), commonly used in erasure-coded storage systems, induces heavy cross-rack traffic, load imbalance, and random access, which adversely affects failure recovery. In this paper, with orthogonal arrays, we define a Deterministic Data Distribution ($D^3$) to uniformly distribute data/parity blocks among nodes, and propose an efficient failure recovery approach based on $D^3$, which minimizes the cross-rack repair traffic against a single node failure. Thanks to the uniformity of $D^3$, the proposed recovery approach balances the repair traffic not only among nodes within a rack but also among racks. We implement $D^3$ over Reed-Solomon codes and Locally Repairable Codes in Hadoop Distributed File System (HDFS) with a cluster of 28 machines. Compared with RDD, our experiments show that $D^3$ significantly speeds up the failure recovery up to 2.49 times for RS codes and 1.38 times for LRCs. Moreover, $D^3$ supports front-end applications better than RDD in both of normal and recovery states.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源