论文标题
高级:在截止日期期间模型开发的动态资源重新分配
HyperSched: Dynamic Resource Reallocation for Model Development on a Deadline
论文作者
论文摘要
对机器学习培训工作负载的资源调度计划的事先研究主要集中在最大程度地减少工作完成时间。通常,这些模型培训工作负载共同搜索大量参数值,这些参数值在超参数搜索中控制学习过程。最好识别和最大程度地提供表现最佳的超参数配置(试验),以尽快获得最高的精度结果。 为了通过固定的截止日期来评估多种配置并培训最有希望的折衷方案,我们设计和构建了hysherchched的置换 - 一种动态的应用程序级资源调度程序,以跟踪,识别和优先将资源分配到最佳性能试验中,以最大程度地提高截止日期的准确性。在先前的工作中,超级参数搜索工作负载的三个属性 - 试验可用性,在不同的配置之间逐渐识别的排名和时空约束 - 以优于标准的标准超参数搜索算法,跨各种基准分析。
Prior research in resource scheduling for machine learning training workloads has largely focused on minimizing job completion times. Commonly, these model training workloads collectively search over a large number of parameter values that control the learning process in a hyperparameter search. It is preferable to identify and maximally provision the best-performing hyperparameter configuration (trial) to achieve the highest accuracy result as soon as possible. To optimally trade-off evaluating multiple configurations and training the most promising ones by a fixed deadline, we design and build HyperSched -- a dynamic application-level resource scheduler to track, identify, and preferentially allocate resources to the best performing trials to maximize accuracy by the deadline. HyperSched leverages three properties of a hyperparameter search workload over-looked in prior work - trial disposability, progressively identifiable rankings among different configurations, and space-time constraints - to outperform standard hyperparameter search algorithms across a variety of benchmarks.