异步SGD在任意延迟下击败Minibatch SGD

论文标题

异步SGD在任意延迟下击败Minibatch SGD

Asynchronous SGD Beats Minibatch SGD Under Arbitrary Delays

论文作者

Mishchenko, Konstantin, Bach, Francis, Even, Mathieu, Woodworth, Blake

论文摘要

当任何延迟较大时，异步随机梯度下降（SGD）的现有分析显着降低，给人的印象是性能主要取决于延迟。相反，无论梯度中的延迟如何，我们都可以更好地保证同步的异步SGD算法，而不是仅取决于用于实现算法的并行设备的数量。我们的保证严格比现有分析要好，我们还认为，在我们考虑的设置中，异步SGD优于同步Minibatch SGD。为了进行分析，我们介绍了基于“虚拟迭代”和延迟自适应步骤的新颖递归，这使我们能够为凸面和非凸目标提供最先进的保证。

The existing analysis of asynchronous stochastic gradient descent (SGD) degrades dramatically when any delay is large, giving the impression that performance depends primarily on the delay. On the contrary, we prove much better guarantees for the same asynchronous SGD algorithm regardless of the delays in the gradients, depending instead just on the number of parallel devices used to implement the algorithm. Our guarantees are strictly better than the existing analyses, and we also argue that asynchronous SGD outperforms synchronous minibatch SGD in the settings we consider. For our analysis, we introduce a novel recursion based on "virtual iterates" and delay-adaptive stepsizes, which allow us to derive state-of-the-art guarantees for both convex and non-convex objectives.

下载PDF全文

下载文献需遵守相关版权规定

论文标题