论文标题
拜占庭的富裕高维联合学习
Byzantine-Resilient High-Dimensional Federated Learning
论文作者
论文摘要
我们研究了由联邦学习的动机,在存在恶意/拜占庭客户的情况下,研究了随机梯度下降(SGD)。客户没有在每次迭代中与中央服务器进行通信,而是维护其本地模型,他们通过根据自己的数据集进行多个SGD迭代来进行更新,然后与服务器与网络更新进行通信,从而实现通信效率。此外,只有一部分客户端与服务器通信,并且在不同的同步时间时,此子集可能有所不同。拜占庭客户可以协作并将任意向量发送到服务器中,以破坏学习过程。为了打击对手,我们使用Steinhardt等人的有效高维稳健平均估计算法。为了分析离群过滤程序,我们开发了一种新型的矩阵浓度结果,可能具有独立的兴趣。 我们在异质数据设置中为强率和非凸的平滑目标提供了收敛分析,在这些数据设置中,不同的客户端可能具有不同的本地数据集,并且我们没有对数据生成的任何概率假设。我们认为,我们的第一种拜占庭式弹性算法和本地迭代分析。我们在对SGD和有界梯度差异的有限方差的最小化假设下得出收敛结果(在本地数据集中捕获异质性)。我们还将结果扩展到客户计算全批次梯度时的情况。
We study stochastic gradient descent (SGD) with local iterations in the presence of malicious/Byzantine clients, motivated by the federated learning. The clients, instead of communicating with the central server in every iteration, maintain their local models, which they update by taking several SGD iterations based on their own datasets and then communicate the net update with the server, thereby achieving communication-efficiency. Furthermore, only a subset of clients communicate with the server, and this subset may be different at different synchronization times. The Byzantine clients may collaborate and send arbitrary vectors to the server to disrupt the learning process. To combat the adversary, we employ an efficient high-dimensional robust mean estimation algorithm from Steinhardt et al.~\cite[ITCS 2018]{Resilience_SCV18} at the server to filter-out corrupt vectors; and to analyze the outlier-filtering procedure, we develop a novel matrix concentration result that may be of independent interest. We provide convergence analyses for strongly-convex and non-convex smooth objectives in the heterogeneous data setting, where different clients may have different local datasets, and we do not make any probabilistic assumptions on data generation. We believe that ours is the first Byzantine-resilient algorithm and analysis with local iterations. We derive our convergence results under minimal assumptions of bounded variance for SGD and bounded gradient dissimilarity (which captures heterogeneity among local datasets). We also extend our results to the case when clients compute full-batch gradients.