论文标题

SGDA带有改组:非Convex-Płimimax优化的更快收敛速度

SGDA with shuffling: faster convergence for nonconvex-PŁ minimax optimization

论文作者

Cho, Hanseul, Yun, Chulhee

论文摘要

随机梯度下降(SGDA)是解决有限和最小值优化问题的主要主持人之一。 SGDA的大多数实际实现都会随机重新兑换组件并顺序使用它们(即,没有替换采样);但是,对于最小值算法,这种方法的理论结果很少,尤其是在易于分析(强 - )单调设置之外。为了缩小这一差距,我们研究了SGDA的收敛范围与随机改组(SGDA-RR),以使用Polyak-łojasiewicz(Pł)几何形状,以实现平滑的非Convex-Nonconcave目标。我们分析了非Convex-Pł和Primal-Pł-PVE目标的同时和交替的SGDA-RR,并比使用更换SGDA更快地获得收敛速度。我们的费率扩展到了Mini Batch SGDA-RR,以恢复已知的全批梯度下降(GDA)的速率。最后,我们为GDA提供了一个综合的下限,其任意台阶比率与Primal-Pł-Płes案例的全批面上限相匹配。

Stochastic gradient descent-ascent (SGDA) is one of the main workhorses for solving finite-sum minimax optimization problems. Most practical implementations of SGDA randomly reshuffle components and sequentially use them (i.e., without-replacement sampling); however, there are few theoretical results on this approach for minimax algorithms, especially outside the easier-to-analyze (strongly-)monotone setups. To narrow this gap, we study the convergence bounds of SGDA with random reshuffling (SGDA-RR) for smooth nonconvex-nonconcave objectives with Polyak-Łojasiewicz (PŁ) geometry. We analyze both simultaneous and alternating SGDA-RR for nonconvex-PŁ and primal-PŁ-PŁ objectives, and obtain convergence rates faster than with-replacement SGDA. Our rates extend to mini-batch SGDA-RR, recovering known rates for full-batch gradient descent-ascent (GDA). Lastly, we present a comprehensive lower bound for GDA with an arbitrary step-size ratio, which matches the full-batch upper bound for the primal-PŁ-PŁ case.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源