论文标题

深矩阵分解的梯度下降:对低等级的动态和隐性偏见

Gradient Descent for Deep Matrix Factorization: Dynamics and Implicit Bias towards Low Rank

论文作者

Chou, Hung-Hsu, Gieshoff, Carsten, Maly, Johannes, Rauhut, Holger

论文摘要

在深度学习中,通常使用比培训点更多的网络参数。在过度参数化的情况下,通常有多个网络实现零训练误差,因此Thetraining Algorithm会引起对计算解决方案的隐式偏差。在实践中,(随机)梯度倾向于偏爱良好的解决方案,从而可以解释深度学习的方法。在本文中,我们分析了线性网络简化和估计问题的梯度下降的动力学。尽管我们没有过度参数化,但我们的分析仍然提供了对隐性偏见现象的见解。实际上,对香草梯度下降的动力学进行严格分析,并表征了频谱的动力学范围。我们能够准确地定位时间间隔,其中有效的迭代等级接近地面矩阵的低级别投影的有效等级。如果需要某些规律性,则可以将这些间隔用作早期停止的标准。 Wealso在更一般的情况下(例如矩阵传感Andrandom初始化)提供了隐性偏见的经验证据。这表明,深度学习更喜欢轨迹的轨迹(有效等级的测量素)在单调上增加,我们认为这是对深度学习的十二个理解理解的基本概念。

In deep learning, it is common to use more network parameters than training points. In such scenarioof over-parameterization, there are usually multiple networks that achieve zero training error so that thetraining algorithm induces an implicit bias on the computed solution. In practice, (stochastic) gradientdescent tends to prefer solutions which generalize well, which provides a possible explanation of thesuccess of deep learning. In this paper we analyze the dynamics of gradient descent in the simplifiedsetting of linear networks and of an estimation problem. Although we are not in an overparameterizedscenario, our analysis nevertheless provides insights into the phenomenon of implicit bias. In fact, wederive a rigorous analysis of the dynamics of vanilla gradient descent, and characterize the dynamicalconvergence of the spectrum. We are able to accurately locate time intervals where the effective rankof the iterates is close to the effective rank of a low-rank projection of the ground-truth matrix. Inpractice, those intervals can be used as criteria for early stopping if a certain regularity is desired. Wealso provide empirical evidence for implicit bias in more general scenarios, such as matrix sensing andrandom initialization. This suggests that deep learning prefers trajectories whose complexity (measuredin terms of effective rank) is monotonically increasing, which we believe is a fundamental concept for thetheoretical understanding of deep learning.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源