SGDA带有改组：非Convex-Płimimax优化的更快收敛速度

论文标题

SGDA带有改组：非Convex-Płimimax优化的更快收敛速度

SGDA with shuffling: faster convergence for nonconvex-PŁ minimax optimization

论文作者

Cho, Hanseul, Yun, Chulhee

论文摘要

随机梯度下降（SGDA）是解决有限和最小值优化问题的主要主持人之一。 SGDA的大多数实际实现都会随机重新兑换组件并顺序使用它们（即，没有替换采样）；但是，对于最小值算法，这种方法的理论结果很少，尤其是在易于分析（强 - ）单调设置之外。为了缩小这一差距，我们研究了SGDA的收敛范围与随机改组（SGDA-RR），以使用Polyak-łojasiewicz（Pł）几何形状，以实现平滑的非Convex-Nonconcave目标。我们分析了非Convex-Pł和Primal-Pł-PVE目标的同时和交替的SGDA-RR，并比使用更换SGDA更快地获得收敛速度。我们的费率扩展到了Mini Batch SGDA-RR，以恢复已知的全批梯度下降（GDA）的速率。最后，我们为GDA提供了一个综合的下限，其任意台阶比率与Primal-Pł-Płes案例的全批面上限相匹配。

Stochastic gradient descent-ascent (SGDA) is one of the main workhorses for solving finite-sum minimax optimization problems. Most practical implementations of SGDA randomly reshuffle components and sequentially use them (i.e., without-replacement sampling); however, there are few theoretical results on this approach for minimax algorithms, especially outside the easier-to-analyze (strongly-)monotone setups. To narrow this gap, we study the convergence bounds of SGDA with random reshuffling (SGDA-RR) for smooth nonconvex-nonconcave objectives with Polyak-Łojasiewicz (PŁ) geometry. We analyze both simultaneous and alternating SGDA-RR for nonconvex-PŁ and primal-PŁ-PŁ objectives, and obtain convergence rates faster than with-replacement SGDA. Our rates extend to mini-batch SGDA-RR, recovering known rates for full-batch gradient descent-ascent (GDA). Lastly, we present a comprehensive lower bound for GDA with an arbitrary step-size ratio, which matches the full-batch upper bound for the primal-PŁ-PŁ case.

下载PDF全文

下载文献需遵守相关版权规定

论文标题