关于不收敛的学习算法的概括

论文标题

关于不收敛的学习算法的概括

On the generalization of learning algorithms that do not converge

论文作者

Chandramoorthy, Nisha, Loukas, Andreas, Gatmiry, Khashayar, Jegelka, Stefanie

论文摘要

深度学习的概括分析通常假定训练会收敛到固定点。但是，最近的结果表明，实际上，通过随机梯度下降优化的深神经网络的权重通常无限期振荡。为了减少理论和实践之间的这种差异，本文着重于神经网络的概括，其训练动力不一定会融合到固定点。我们的主要贡献是提出一个统计算法稳定性（SAS）的概念，该算法将经典算法稳定性扩展到非convergergent算法并研究其与泛化的联系。与传统的优化和学习理论观点相比，这种崇高的理论方法会带来新的见解。我们证明，学习算法的时间复杂行为的稳定性与其泛化有关，并在经验上证明了损失动力学如何为概括性能提供线索。我们的发现提供了证据表明，即使训练无限期地持续并且权重也不会融合，即使训练继续持续，训练稳定稳定”的网络也是如此。

Generalization analyses of deep learning typically assume that the training converges to a fixed point. But, recent results indicate that in practice, the weights of deep neural networks optimized with stochastic gradient descent often oscillate indefinitely. To reduce this discrepancy between theory and practice, this paper focuses on the generalization of neural networks whose training dynamics do not necessarily converge to fixed points. Our main contribution is to propose a notion of statistical algorithmic stability (SAS) that extends classical algorithmic stability to non-convergent algorithms and to study its connection to generalization. This ergodic-theoretic approach leads to new insights when compared to the traditional optimization and learning theory perspectives. We prove that the stability of the time-asymptotic behavior of a learning algorithm relates to its generalization and empirically demonstrate how loss dynamics can provide clues to generalization performance. Our findings provide evidence that networks that "train stably generalize better" even when the training continues indefinitely and the weights do not converge.

下载PDF全文

下载文献需遵守相关版权规定

论文标题