DR-DSGD：一个分布强劲的分散分散化学习算法在图上

论文标题

DR-DSGD：一个分布强劲的分散分散化学习算法在图上

DR-DSGD: A Distributionally Robust Decentralized Learning Algorithm over Graphs

论文作者

Issaid, Chaouki Ben, Elgabli, Anis, Bennis, Mehdi

论文摘要

在本文中，我们建议在分散的环境中解决一个正规化的分布鲁棒性学习问题，并考虑到数据分配变化。通过将Kullback-Liebler正则化功能添加到可靠的Min-Max优化问题中，可以将学习问题降低到修改的可靠最小化问题并有效地解决。利用新配制的优化问题，我们提出了一个强大的版本的分散的随机梯度下降（DSGD），分布在分布稳健的分散性随机梯度下降（DR-DSGD）。 Under some mild assumptions and provided that the regularization parameter is larger than one, we theoretically prove that DR-DSGD achieves a convergence rate of $\mathcal{O}\left(1/\sqrt{KT} + K/T\right)$, where $K$ is the number of devices and $T$ is the number of iterations.仿真结果表明，我们提出的算法可以提高最差的分配测试准确性，最高$ 10 \％$。此外，DR-DSGD比DSGD更具沟通效率，因为它需要更少的通信回合（最高$ 20 $ $倍）才能达到相同的最差分配测试准确性目标。此外，进行的实验表明，在测试准确性方面，DR-DSGD会导致整个设备的性能更公平。

In this paper, we propose to solve a regularized distributionally robust learning problem in the decentralized setting, taking into account the data distribution shift. By adding a Kullback-Liebler regularization function to the robust min-max optimization problem, the learning problem can be reduced to a modified robust minimization problem and solved efficiently. Leveraging the newly formulated optimization problem, we propose a robust version of Decentralized Stochastic Gradient Descent (DSGD), coined Distributionally Robust Decentralized Stochastic Gradient Descent (DR-DSGD). Under some mild assumptions and provided that the regularization parameter is larger than one, we theoretically prove that DR-DSGD achieves a convergence rate of $\mathcal{O}\left(1/\sqrt{KT} + K/T\right)$, where $K$ is the number of devices and $T$ is the number of iterations. Simulation results show that our proposed algorithm can improve the worst distribution test accuracy by up to $10\%$. Moreover, DR-DSGD is more communication-efficient than DSGD since it requires fewer communication rounds (up to $20$ times less) to achieve the same worst distribution test accuracy target. Furthermore, the conducted experiments reveal that DR-DSGD results in a fairer performance across devices in terms of test accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题