通过二元性明确正规化随机梯度方法

论文标题

通过二元性明确正规化随机梯度方法

Explicit Regularization of Stochastic Gradient Methods through Duality

论文作者

Raj, Anant, Bach, Francis

论文摘要

我们考虑在插值方面的随机梯度方法，可以获得完美的拟合度（每次观察时最小损失）。虽然先前的工作强调了这种算法的隐式正规化，但我们将明确的正则化框架视为最低限度的Bregman Divergence凸出可行性问题。使用凸双重性，我们提出基于随机双坐标上升的随机染料式算法。对于非加速坐标下降，我们获得了一种算法，该算法与（非平均）随机镜下降具有很强的相似性在特定函数上，因为它对于二次目标是等效的，并且在早期迭代中等同于更一般的目标。它具有明确收敛定理与最低规范解决方案的好处。对于加速的坐标下降，我们获得了一种新算法，该算法比插值方案中现有的随机梯度方法具有更好的收敛性能。这导致了通用$ \ ell_p $ -norm正规化器的感知器的加速版，我们在实验中说明了这一点。

We consider stochastic gradient methods under the interpolation regime where a perfect fit can be obtained (minimum loss at each observation). While previous work highlighted the implicit regularization of such algorithms, we consider an explicit regularization framework as a minimum Bregman divergence convex feasibility problem. Using convex duality, we propose randomized Dykstra-style algorithms based on randomized dual coordinate ascent. For non-accelerated coordinate descent, we obtain an algorithm which bears strong similarities with (non-averaged) stochastic mirror descent on specific functions, as it is is equivalent for quadratic objectives, and equivalent in the early iterations for more general objectives. It comes with the benefit of an explicit convergence theorem to a minimum norm solution. For accelerated coordinate descent, we obtain a new algorithm that has better convergence properties than existing stochastic gradient methods in the interpolating regime. This leads to accelerated versions of the perceptron for generic $\ell_p$-norm regularizers, which we illustrate in experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题