论文标题
摆脱您的约束并进行修复:NNL和隐性偏见的研究
Get rid of your constraints and reparametrize: A study in NNLS and implicit bias
论文作者
论文摘要
在过去的几年中,人们对理解梯度下降优化的隐式偏见及其与过度参数化神经网络的概括特性的联系一直引起人们的兴趣。几项工作观察到,当训练线性对角线网络对回归任务的正方形损耗(对应于过多散热的线性回归)时,梯度下降会收敛到特殊溶液,例如非阴性。我们将此观察结果与Riemannian优化联系起来,并将过多散热性GD视为Riemannian GD。我们使用这一事实来求解非负平方(NNL),这是许多技术背后的重要问题,例如非负基质分解。我们表明,重新处理目标上的梯度流将全球收敛到NNLS溶液,从而为其离散的对应物提供了收敛速率。与以前的方法不同,我们不依赖指数图或大地测量学的计算。我们进一步显示使用二阶颂歌加速收敛,并将其本身借给加速下降方法。最后,我们建立了针对负面扰动的稳定性,并讨论了对其他受约束优化问题的概括。
Over the past years, there has been significant interest in understanding the implicit bias of gradient descent optimization and its connection to the generalization properties of overparametrized neural networks. Several works observed that when training linear diagonal networks on the square loss for regression tasks (which corresponds to overparametrized linear regression) gradient descent converges to special solutions, e.g., non-negative ones. We connect this observation to Riemannian optimization and view overparametrized GD with identical initialization as a Riemannian GD. We use this fact for solving non-negative least squares (NNLS), an important problem behind many techniques, e.g., non-negative matrix factorization. We show that gradient flow on the reparametrized objective converges globally to NNLS solutions, providing convergence rates also for its discretized counterpart. Unlike previous methods, we do not rely on the calculation of exponential maps or geodesics. We further show accelerated convergence using a second-order ODE, lending itself to accelerated descent methods. Finally, we establish the stability against negative perturbations and discuss generalization to other constrained optimization problems.