在无线边缘的联合学习的更新意识设备调度的收敛

论文标题

在无线边缘的联合学习的更新意识设备调度的收敛

Convergence of Update Aware Device Scheduling for Federated Learning at the Wireless Edge

论文作者

Amiri, Mohammad Mohammadi, Gunduz, Deniz, Kulkarni, Sanjeev R., Poor, H. Vincent

论文摘要

我们研究了无线边缘的联合学习（FL），其中使用本地数据集的功率受限设备在远程参数服务器（PS）的帮助下协作训练联合模型。我们假设设备通过带宽限制的共享无线通道连接到PS。在FL的每次迭代中，设备的一个子集都计划在正交通道资源上传输其本地模型更新到PS，而每个参与的设备必须压缩其模型更新以适应其链路容量。我们设计了新颖的调度和资源分配策略，这些策略决定了在每回合中传输的设备的子集，以及如何根据其渠道条件分配资源在参与设备之间，还基于其本地模型更新的意义。然后，我们通过设备调度来建立无线FL算法的收敛，在该算法中，设备传达其消息的能力有限。数值实验的结果表明，基于渠道条件和本地模型更新的重要性，所提出的调度策略比仅基于单独基于两个指标中的任何一个的调度策略提供了更好的长期性能。此外，我们观察到，当数据独立并且跨设备分布相同分布（I.I.D.）时，在每个回合中选择一个设备可提供最佳性能，而当数据分布是非I.I.D.时，在每个回合中安排多个设备会提高性能。该观察结果通过收敛结果验证，这表明计划的设备的数量应增加，以增加较小和偏见的数据分布。

We study federated learning (FL) at the wireless edge, where power-limited devices with local datasets collaboratively train a joint model with the help of a remote parameter server (PS). We assume that the devices are connected to the PS through a bandwidth-limited shared wireless channel. At each iteration of FL, a subset of the devices are scheduled to transmit their local model updates to the PS over orthogonal channel resources, while each participating device must compress its model update to accommodate to its link capacity. We design novel scheduling and resource allocation policies that decide on the subset of the devices to transmit at each round, and how the resources should be allocated among the participating devices, not only based on their channel conditions, but also on the significance of their local model updates. We then establish convergence of a wireless FL algorithm with device scheduling, where devices have limited capacity to convey their messages. The results of numerical experiments show that the proposed scheduling policy, based on both the channel conditions and the significance of the local model updates, provides a better long-term performance than scheduling policies based only on either of the two metrics individually. Furthermore, we observe that when the data is independent and identically distributed (i.i.d.) across devices, selecting a single device at each round provides the best performance, while when the data distribution is non-i.i.d., scheduling multiple devices at each round improves the performance. This observation is verified by the convergence result, which shows that the number of scheduled devices should increase for a less diverse and more biased data distribution.

下载PDF全文

下载文献需遵守相关版权规定

论文标题