使用随机梯度下降（SGD）和基于规范的比较梯度消除（CGE）的拜占庭耐性分布式机器学习

论文标题

使用随机梯度下降（SGD）和基于规范的比较梯度消除（CGE）的拜占庭耐性分布式机器学习

Byzantine Fault-Tolerant Distributed Machine Learning Using Stochastic Gradient Descent (SGD) and Norm-Based Comparative Gradient Elimination (CGE)

论文作者

Gupta, Nirupam, Liu, Shuo, Vaidya, Nitin H.

论文摘要

本文考虑了分布式随机梯度下降（D-SGD）方法中拜占庭断层耐受性问题 - 一种流行的分布式多机器机器学习算法。在此问题中，每个代理都会独立于某些数据生成分布来采样数据。在无故障情况下，D-SGD方法允许所有代理学习一个数学模型，最能拟合所有代理共同采样的数据。我们考虑了一小部分代理可能是拜占庭式有缺陷的情况。这种故障的药物可能无法正确遵循规定的算法，并且通过共享任意不正确的随机梯度来使传统的D-SGD方法无效。我们提出了一种基于规范的梯度过滤器，称为比较梯度消除（CGE），该过滤器可鲁棒地针对拜占庭式药物使用D-SGD方法。我们表明，在标准随机假设下，CGE梯度过滤器可以保证对拜占庭式药物的有界部分的断层耐受性，并且与许多现有的梯度过滤器（如多krum，几何，几何中位数中位数和光谱过滤器）相比，计算上更简单。我们从经验上表明，通过模拟神经网络上的分布式学习，CGE的断层耐受性与现有梯度过滤器相当。我们还从经验上表明，随机梯度的指数平均可以提高通用梯度过滤器的断层耐受性。

This paper considers the Byzantine fault-tolerance problem in distributed stochastic gradient descent (D-SGD) method - a popular algorithm for distributed multi-agent machine learning. In this problem, each agent samples data points independently from a certain data-generating distribution. In the fault-free case, the D-SGD method allows all the agents to learn a mathematical model best fitting the data collectively sampled by all agents. We consider the case when a fraction of agents may be Byzantine faulty. Such faulty agents may not follow a prescribed algorithm correctly, and may render traditional D-SGD method ineffective by sharing arbitrary incorrect stochastic gradients. We propose a norm-based gradient-filter, named comparative gradient elimination (CGE), that robustifies the D-SGD method against Byzantine agents. We show that the CGE gradient-filter guarantees fault-tolerance against a bounded fraction of Byzantine agents under standard stochastic assumptions, and is computationally simpler compared to many existing gradient-filters such as multi-KRUM, geometric median-of-means, and the spectral filters. We empirically show, by simulating distributed learning on neural networks, that the fault-tolerance of CGE is comparable to that of existing gradient-filters. We also empirically show that exponential averaging of stochastic gradients improves the fault-tolerance of a generic gradient-filter.

下载PDF全文

下载文献需遵守相关版权规定

论文标题