论文标题
无需替换的采样导致有限和最小值优化的速度更快
Sampling without Replacement Leads to Faster Rates in Finite-Sum Minimax Optimization
论文作者
论文摘要
我们分析了用于平滑有限和最小值优化的随机梯度算法的收敛速率,并表明,对于许多此类算法,与替换相比,对数据点进行采样而无需更换的数据点会导致更快的收敛性。对于平稳且强烈凸出的凹面设置,我们考虑梯度下降和近端方法,并对两种流行的无需更换采样策略进行统一分析,即随机重新填充(RR),这些分析仅将每个时代的数据和单个时代和单个杂耍或单打或避免(So So)(So So)变为(So),这是一开始就开始进行的。我们获得了RR的紧密收敛速率,并证明这些策略会导致比均匀采样更快地收敛。超越了凸度,我们获得了相似的结果,用于满足双面polyak-lojasiewicz不平等的光滑非convex-nonconcave目标。最后,我们证明我们的技术足够通用,可以分析数据顺序攻击的效果,其中对手操纵将数据点提供给优化器的顺序。我们的分析还恢复了增量梯度方法的紧张速率,在该方法中,数据点根本没有被改组。
We analyze the convergence rates of stochastic gradient algorithms for smooth finite-sum minimax optimization and show that, for many such algorithms, sampling the data points without replacement leads to faster convergence compared to sampling with replacement. For the smooth and strongly convex-strongly concave setting, we consider gradient descent ascent and the proximal point method, and present a unified analysis of two popular without-replacement sampling strategies, namely Random Reshuffling (RR), which shuffles the data every epoch, and Single Shuffling or Shuffle Once (SO), which shuffles only at the beginning. We obtain tight convergence rates for RR and SO and demonstrate that these strategies lead to faster convergence than uniform sampling. Moving beyond convexity, we obtain similar results for smooth nonconvex-nonconcave objectives satisfying a two-sided Polyak-Łojasiewicz inequality. Finally, we demonstrate that our techniques are general enough to analyze the effect of data-ordering attacks, where an adversary manipulates the order in which data points are supplied to the optimizer. Our analysis also recovers tight rates for the incremental gradient method, where the data points are not shuffled at all.