论文标题
深度学习中的隐式正规化可能无法通过规范来解释
Implicit Regularization in Deep Learning May Not Be Explainable by Norms
论文作者
论文摘要
在数学上表征基于梯度的优化引起的隐式正则化,这是深度学习理论的长期追求。一个普遍的希望是,基于最小化规范的特征可能适用,并且用于研究此前景的标准测试床是矩阵分解(通过线性神经网络完成矩阵完成)。这是一个空旷的问题,规范是否可以解释矩阵分解中的隐式正则化。当前的论文通过证明存在自然的矩阵分解问题来解决这个开放的问题,而隐式正则化将所有规范(和准标准)驱动到无穷大。我们的结果表明,与其通过规范感知隐式正则化,而是一种可能更有用的解释是最小化等级。我们从经验上证明,这种解释扩展到一类非线性神经网络,并假设这可能是解释深度学习中的概括的关键。
Mathematically characterizing the implicit regularization induced by gradient-based optimization is a longstanding pursuit in the theory of deep learning. A widespread hope is that a characterization based on minimization of norms may apply, and a standard test-bed for studying this prospect is matrix factorization (matrix completion via linear neural networks). It is an open question whether norms can explain the implicit regularization in matrix factorization. The current paper resolves this open question in the negative, by proving that there exist natural matrix factorization problems on which the implicit regularization drives all norms (and quasi-norms) towards infinity. Our results suggest that, rather than perceiving the implicit regularization via norms, a potentially more useful interpretation is minimization of rank. We demonstrate empirically that this interpretation extends to a certain class of non-linear neural networks, and hypothesize that it may be key to explaining generalization in deep learning.