二重性优化器会从循环中受益吗

论文标题

二重性优化器会从循环中受益吗

Will Bilevel Optimizers Benefit from Loops

论文作者

Ji, Kaiyi, Liu, Mingrui, Liang, Yingbin, Ying, Lei

论文摘要

二重性优化已成为解决各种机器学习问题的强大工具。当前的两个流行的双杆优化器AIDD-BIO和ITD-BIO自然涉及解决一个或两个子问题，因此，无论我们是否解决了循环（进行许多迭代）或没有循环（仅进行几次迭代）的这些问题，都可以显着影响整体计算效率。文献中的现有研究仅涵盖了其中一些实施选择，并且可用的复杂性范围不足以使不同实施之间进行严格的比较。在本文中，我们首先为AID-BIO和ITD-BIO建立统一的收敛分析，这些分析适用于所有循环的实施选择。然后，我们专门研究结果来表征所有实现的计算复杂性，从而使它们之间进行了明确的比较。我们的结果表明，对于AID-BIO，估计内部功能最佳点的循环对总体效率是有益的，尽管对于每个更新步骤会导致更高的复杂性，而近似于外部级别的Hessian Inspersever-vector产物的循环降低了梯度的复杂性。对于ITD-BIO，这两个循环始终并存，我们的收敛上限和下限表明，这种循环对于保证消失的收敛误差是必要的，而NO-loop方案则遭受不可避免的不可呈现的融合误差。我们的数值实验进一步证实了我们的理论结果。

Bilevel optimization has arisen as a powerful tool for solving a variety of machine learning problems. Two current popular bilevel optimizers AID-BiO and ITD-BiO naturally involve solving one or two sub-problems, and consequently, whether we solve these problems with loops (that take many iterations) or without loops (that take only a few iterations) can significantly affect the overall computational efficiency. Existing studies in the literature cover only some of those implementation choices, and the complexity bounds available are not refined enough to enable rigorous comparison among different implementations. In this paper, we first establish unified convergence analysis for both AID-BiO and ITD-BiO that are applicable to all implementation choices of loops. We then specialize our results to characterize the computational complexity for all implementations, which enable an explicit comparison among them. Our result indicates that for AID-BiO, the loop for estimating the optimal point of the inner function is beneficial for overall efficiency, although it causes higher complexity for each update step, and the loop for approximating the outer-level Hessian-inverse-vector product reduces the gradient complexity. For ITD-BiO, the two loops always coexist, and our convergence upper and lower bounds show that such loops are necessary to guarantee a vanishing convergence error, whereas the no-loop scheme suffers from an unavoidable non-vanishing convergence error. Our numerical experiments further corroborate our theoretical results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题