论文标题

FedlessCan:减轻无服务器联合学习中的散散器

FedLesScan: Mitigating Stragglers in Serverless Federated Learning

论文作者

Elzohairy, Mohamed, Chadha, Mohak, Jindal, Anshul, Grafberger, Andreas, Gu, Jianfeng, Gerndt, Michael, Abboud, Osama

论文摘要

Federated学习(FL)是一种机器学习范式,可以在分布式客户的同时对培训数据进行本地培训。尽管大多数先前在FL设计系统上的工作都集中在使用状态始终运行的组件上,但最近的工作表明,FL系统中的组件可以从无服务器计算和功能即服务技术中大大受益。为此,与传统的FL系统相比,使用无服务器系统的模型分布式培训可能更有资源效率和便宜。但是,无服务器的FL系统仍然遭受Stragglers的存在,即由于其资源和统计异质性,客户端的速度慢了。尽管已经提出了一些用于减轻FL中的散乱者的策略,但大多数方法没有说明无服务器环境的特定特征,即冷启动,性能变化和功能实例的短暂性无状态性质。在此方面,我们提出了Fedlesscan,这是一种新型的基于聚类的半同步训练策略,专门针对无服务器FL量身定制。 FedEsscan动态适应了客户的行为,并最大程度地减少了散乱者对整个系统的影响。我们通过扩展一个名为FedLeless的开源无服务器系统来实现我们的策略。此外,我们使用第二代Google Cloud功能全面评估了我们的策略,其中有四个数据集和不同百分比的散曲机。我们的实验结果表明,与其他食用无糖的方法相比,培训时间和成本平均减少了8%和20%,同时可以更好地利用客户,而有效更新比率的平均增加为17.75%。

Federated Learning (FL) is a machine learning paradigm that enables the training of a shared global model across distributed clients while keeping the training data local. While most prior work on designing systems for FL has focused on using stateful always running components, recent work has shown that components in an FL system can greatly benefit from the usage of serverless computing and Function-as-a-Service technologies. To this end, distributed training of models with serverless FL systems can be more resource-efficient and cheaper than conventional FL systems. However, serverless FL systems still suffer from the presence of stragglers, i.e., slow clients due to their resource and statistical heterogeneity. While several strategies have been proposed for mitigating stragglers in FL, most methodologies do not account for the particular characteristics of serverless environments, i.e., cold-starts, performance variations, and the ephemeral stateless nature of the function instances. Towards this, we propose FedLesScan, a novel clustering-based semi-asynchronous training strategy, specifically tailored for serverless FL. FedLesScan dynamically adapts to the behaviour of clients and minimizes the effect of stragglers on the overall system. We implement our strategy by extending an open-source serverless FL system called FedLess. Moreover, we comprehensively evaluate our strategy using the 2nd generation Google Cloud Functions with four datasets and varying percentages of stragglers. Results from our experiments show that compared to other approaches FedLesScan reduces training time and cost by an average of 8% and 20% respectively while utilizing clients better with an average increase in the effective update ratio of 17.75%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源