非平滑正规化有限和优化的随机近端方法

论文标题

非平滑正规化有限和优化的随机近端方法

A Stochastic Proximal Method for Nonsmooth Regularized Finite Sum Optimization

论文作者

Lakhmiri, Dounia, Orban, Dominique, Lodi, Andrea

论文摘要

我们考虑了训练具有非平滑正则化的深神经网络以检索稀疏有效的子结构的问题。我们的常规化器仅被认为是较低的半连续和限制的。我们将一种自适应二次正规化方法与近端随机梯度原理相结合，以得出一个称为SR2的新求解器，该求解器的收敛性和最差的案例复杂性是在没有知识或近似梯度的Lipschitz常数的情况下建立的。我们制定了一个停止标准，该标准确保在某些条件下，适当的一阶平稳性度量收敛到零。我们建立了$ \ Mathcal {o}（ε^{ - 2}）$的最坏情况迭代复杂性，该复杂性与Proxgen（如Proxgen）的相关方法相匹配，其中假定学习率与Lipschitz常数有关。我们在CIFAR-10和CIFAR-100培训的网络实例上使用$ \ ell_1 $和$ \ ell_0 $正则化的实验表明，SR2始终比Proxgen和Proxsgd等相关方法始终达到更高的稀疏性和准确性。

We consider the problem of training a deep neural network with nonsmooth regularization to retrieve a sparse and efficient sub-structure. Our regularizer is only assumed to be lower semi-continuous and prox-bounded. We combine an adaptive quadratic regularization approach with proximal stochastic gradient principles to derive a new solver, called SR2, whose convergence and worst-case complexity are established without knowledge or approximation of the gradient's Lipschitz constant. We formulate a stopping criteria that ensures an appropriate first-order stationarity measure converges to zero under certain conditions. We establish a worst-case iteration complexity of $\mathcal{O}(ε^{-2})$ that matches those of related methods like ProxGEN, where the learning rate is assumed to be related to the Lipschitz constant. Our experiments on network instances trained on CIFAR-10 and CIFAR-100 with $\ell_1$ and $\ell_0$ regularizations show that SR2 consistently achieves higher sparsity and accuracy than related methods such as ProxGEN and ProxSGD.

下载PDF全文

下载文献需遵守相关版权规定

论文标题