论文标题
最佳解码的近似梯度编码
Approximate Gradient Coding with Optimal Decoding
论文作者
论文摘要
在分布式优化问题中,一种称为梯度编码的技术,涉及复制数据点,已用于减轻散乱的机器的效果。最近的工作研究了近似梯度编码,这涉及编码方案,在该方案中,数据的复制因子太低而无法准确恢复完整的梯度。我们的工作是由创建近似梯度编码方案的挑战所激发的,这些梯度编码方案同时在对抗和随机模型中都可以很好地运作。为此,我们基于扩展器图引入了新颖的近似梯度代码,其中每台机器正好接收两个数据点。当使用最佳解码系数时,我们将在随机和对抗性散布设置中分析解码误差。我们表明,在随机设置中,我们的方案会给梯度带来误差,该梯度在复制因子中呈指数衰减。在对抗性设置中,错误几乎比在随机设置中具有相似性能的任何现有代码小的两个倍。我们在使用我们的代码的标准假设下以随机和对抗性设置显示趋同的界限。在随机设置中,我们的收敛速率在块框边界上有所提高。在对抗性环境中,我们表明梯度下降可以向下收敛到与对抗误差线性缩放到梯度的噪声层。我们从经验上证明,我们的方案在随机设置中达到了几乎最佳的误差,并且比不使用最佳解码系数的算法更快地收敛。
In distributed optimization problems, a technique called gradient coding, which involves replicating data points, has been used to mitigate the effect of straggling machines. Recent work has studied approximate gradient coding, which concerns coding schemes where the replication factor of the data is too low to recover the full gradient exactly. Our work is motivated by the challenge of creating approximate gradient coding schemes that simultaneously work well in both the adversarial and stochastic models. To that end, we introduce novel approximate gradient codes based on expander graphs, in which each machine receives exactly two blocks of data points. We analyze the decoding error both in the random and adversarial straggler setting, when optimal decoding coefficients are used. We show that in the random setting, our schemes achieve an error to the gradient that decays exponentially in the replication factor. In the adversarial setting, the error is nearly a factor of two smaller than any existing code with similar performance in the random setting. We show convergence bounds both in the random and adversarial setting for gradient descent under standard assumptions using our codes. In the random setting, our convergence rate improves upon block-box bounds. In the adversarial setting, we show that gradient descent can converge down to a noise floor that scales linearly with the adversarial error to the gradient. We demonstrate empirically that our schemes achieve near-optimal error in the random setting and converge faster than algorithms which do not use the optimal decoding coefficients.