论文标题
MLLESS:在无服务器机器学习培训中实现成本效率
MLLess: Achieving Cost Efficiency in Serverless Machine Learning Training
论文作者
论文摘要
功能-AS-A-Service(FAAS)引起了人们对如何“驯服”无服务器计算以启用特定领域用例(例如数据密集型应用程序和机器学习(ML))的兴趣。最近,已经实施了几种用于培训ML模型的系统。当然,这些研究文章是正确方向上的重要步骤。但是,与传统的“服务器”计算相比,他们没有完全回答何时无服务器ML培训更具成本效益的问题。为了帮助这项工作,我们提出了MLLESS,这是IBM Cloud功能上建立的基于FAA的ML训练原型。为了提高成本效率,MLLESS实现了针对无服务器计算特征的两个创新优化:一方面,一种显着性过滤器,使间接通信更有效,另一方面,一种规模内自动调节器,通过从FAAS次要账单模型中受益(通常为每100ms)来减少成本。我们的结果证明,对于稀疏的ML模型,MLLESS的成本比服务器的ML系统快15倍,这些模型表现出快速收敛,例如稀疏逻辑回归和矩阵分解。此外,我们的结果表明,MLLESS可以轻松地扩展到越来越大的无服务器工人的车队。
Function-as-a-Service (FaaS) has raised a growing interest in how to "tame" serverless computing to enable domain-specific use cases such as data-intensive applications and machine learning (ML), to name a few. Recently, several systems have been implemented for training ML models. Certainly, these research articles are significant steps in the correct direction. However, they do not completely answer the nagging question of when serverless ML training can be more cost-effective compared to traditional "serverful" computing. To help in this endeavor, we propose MLLess, a FaaS-based ML training prototype built atop IBM Cloud Functions. To boost cost-efficiency, MLLess implements two innovative optimizations tailored to the traits of serverless computing: on one hand, a significance filter, to make indirect communication more effective, and on the other hand, a scale-in auto-tuner, to reduce cost by benefiting from the FaaS sub-second billing model (often per 100ms). Our results certify that MLLess can be 15X faster than serverful ML systems at a lower cost for sparse ML models that exhibit fast convergence such as sparse logistic regression and matrix factorization. Furthermore, our results show that MLLess can easily scale out to increasingly large fleets of serverless workers.