在学习优化器中，记忆，计算和性能之间的实际权衡

论文标题

在学习优化器中，记忆，计算和性能之间的实际权衡

Practical tradeoffs between memory, compute, and performance in learned optimizers

论文作者

Metz, Luke, Freeman, C. Daniel, Harrison, James, Maheswaranathan, Niru, Sohl-Dickstein, Jascha

论文摘要

优化在开发机器学习系统中起昂贵且至关重要的作用。在学习的优化器中，常用手工设计的优化器的少数超参数，例如Adam或SGD用灵活的参数功能代替。然后对这些功能的参数进行了优化，以便所得的学习优化器最大程度地减少所选模型类别的目标损失。学识渊博的优化者都可以减少所需的训练步骤的数量并改善最终测试损失。但是，训练可能很昂贵，一旦经过培训，由于计算机和优化器本身的内存开销，使用训练可能会很昂贵。在这项工作中，我们识别并量化了许多学习和手工设计的优化器的内存，计算和性能权衡的设计功能。我们进一步利用分析来构建一种比以前的工作更快，更有效的学习优化器。我们的模型和培训代码是开源的。

Optimization plays a costly and crucial role in developing machine learning systems. In learned optimizers, the few hyperparameters of commonly used hand-designed optimizers, e.g. Adam or SGD, are replaced with flexible parametric functions. The parameters of these functions are then optimized so that the resulting learned optimizer minimizes a target loss on a chosen class of models. Learned optimizers can both reduce the number of required training steps and improve the final test loss. However, they can be expensive to train, and once trained can be expensive to use due to computational and memory overhead for the optimizer itself. In this work, we identify and quantify the design features governing the memory, compute, and performance trade-offs for many learned and hand-designed optimizers. We further leverage our analysis to construct a learned optimizer that is both faster and more memory efficient than previous work. Our model and training code are open source.

下载PDF全文

下载文献需遵守相关版权规定

论文标题