彩票意识到稀疏狩猎：在资源有限的边缘实现联合学习

论文标题

彩票意识到稀疏狩猎：在资源有限的边缘实现联合学习

Lottery Aware Sparsity Hunting: Enabling Federated Learning on Resource-Limited Edge

论文作者

Babakniya, Sara, Kundu, Souvik, Prakash, Saurav, Niu, Yue, Avestimehr, Salman

论文摘要

边缘设备由于其分布性质，可以从联邦学习中受益匪浅；但是，它们的有限资源和计算能力会在部署中受到限制。解决此问题的一个可能解决方案是利用客户的稀疏学习算法来满足其资源预算。但是，在客户中如此幼稚的部署会导致明显的准确性退化，尤其是对于高度资源受限的客户而言。特别是，我们的调查表明，客户之间缺乏共识，可能会降低全球模型的收敛性并导致大量准确性下降。通过这些观察结果，我们提出\ textit {联合彩票意识到稀疏}（闪光），这是一个统一的稀疏学习框架，用于训练稀疏的子模型，该稀疏子模型在超低参数密度下保持性能，同时产生比例的交流益处。此外，鉴于不同的客户端可能具有不同的资源预算，我们提出\ textit {hetero-flash}，在其中客户可以根据其设备资源限制来获取不同的密度预算，而不是仅支持一个目标参数密度。对各种模型和数据集的实验分析表明，闪光灯以未经修复的基线缩小差距的优越性，同时屈服于$ \ mathord {\ sim} 10.1 \％$ $提高准确性，$ \ mathord {\ sim} 10.26 10.26 \与现有替代方案相比，在类似的超层层级设置中，$少数$ $ $ $。代码可在\ url {https://github.com/sarababakn/flash_fl}上找到。

Edge devices can benefit remarkably from federated learning due to their distributed nature; however, their limited resource and computing power poses limitations in deployment. A possible solution to this problem is to utilize off-the-shelf sparse learning algorithms at the clients to meet their resource budget. However, such naive deployment in the clients causes significant accuracy degradation, especially for highly resource-constrained clients. In particular, our investigations reveal that the lack of consensus in the sparsity masks among the clients may potentially slow down the convergence of the global model and cause a substantial accuracy drop. With these observations, we present \textit{federated lottery aware sparsity hunting} (FLASH), a unified sparse learning framework for training a sparse sub-model that maintains the performance under ultra-low parameter density while yielding proportional communication benefits. Moreover, given that different clients may have different resource budgets, we present \textit{hetero-FLASH} where clients can take different density budgets based on their device resource limitations instead of supporting only one target parameter density. Experimental analysis on diverse models and datasets shows the superiority of FLASH in closing the gap with an unpruned baseline while yielding up to $\mathord{\sim}10.1\%$ improved accuracy with $\mathord{\sim}10.26\times$ fewer communication, compared to existing alternatives, at similar hyperparameter settings. Code is available at \url{https://github.com/SaraBabakN/flash_fl}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题