论文标题
分布式存储中代码转换的带宽成本:基本限制和最佳构造
Bandwidth Cost of Code Conversions in Distributed Storage: Fundamental Limits and Optimal Constructions
论文作者
论文摘要
擦除代码已成为分布式存储系统不可或缺的一部分,作为在设备故障的持续威胁下提供数据可靠性和耐用性的工具。在这样的系统中,有限字段上的$ [n,k] $代码$ \ mathbb {f} _q $编码$ k $消息符号中的$ n $ codeWord符号来自$ \ m athbb {f} _q $,然后将其存储在系统中的$ n $不同节点上。最近的工作表明,可以通过调整$ n $和$ k $来为设备故障率的变化来节省大量存储空间。这样的调整需要代码转换:在最终$ [n^f,k^f] $代码下,在初始$ [n^i,k^i] $代码下转换已经编码数据的过程。转换的默认方法是重新编码数据,这给系统资源带来了重大负担。可转换代码是最近提出的一类代码,用于启用资源有效的转换。可转换代码上的现有工作集中在最小化访问成本,即转换过程中访问的代码符号数量。带宽对应于读取和传输的数据量,是要优化的另一个重要资源。 在本文中,我们启动了有关代码转换过程中使用的带宽的基本限制和带宽最佳可转换代码的构造的研究。首先,我们使用具有可变容量边缘的网络信息流程图对代码转换问题进行建模。其次,专注于MDS代码和一个称为合并制度的重要参数制度,我们在转换的带宽成本上得出了紧密的下限。派生的界限表明,即使在与默认方法相比,访问成本无法降低的制度,带宽成本也可以大大降低。第三,我们提出了与MDS可转换代码的新结构,该构造与所提出的下限匹配,因此在转换过程中是最佳的带宽。
Erasure codes have become an integral part of distributed storage systems as a tool for providing data reliability and durability under the constant threat of device failures. In such systems, an $[n, k]$ code over a finite field $\mathbb{F}_q$ encodes $k$ message symbols into $n$ codeword symbols from $\mathbb{F}_q$ which are then stored on $n$ different nodes in the system. Recent work has shown that significant savings in storage space can be obtained by tuning $n$ and $k$ to variations in device failure rates. Such a tuning necessitates code conversion: the process of converting already encoded data under an initial $[n^I, k^I]$ code to its equivalent under a final $[n^F, k^F]$ code. The default approach to conversion is to reencode data, which places significant burden on system resources. Convertible codes are a recently proposed class of codes for enabling resource-efficient conversions. Existing work on convertible codes has focused on minimizing access cost, i.e., the number of code symbols accessed during conversion. Bandwidth, which corresponds to the amount of data read and transferred, is another important resource to optimize. In this paper, we initiate the study on the fundamental limits on bandwidth used during code conversion and present constructions for bandwidth-optimal convertible codes. First, we model the code conversion problem using network information flow graphs with variable capacity edges. Second, focusing on MDS codes and an important parameter regime called the merge regime, we derive tight lower bounds on the bandwidth cost of conversion. The derived bounds show that bandwidth cost can be significantly reduced even in regimes where access cost cannot be reduced as compared to the default approach. Third, we present a new construction for MDS convertible codes which matches the proposed lower bound and is thus bandwidth-optimal during conversion.