为什么神经网络找到简单的解决方案：几何复杂性的许多正规化器

论文标题

为什么神经网络找到简单的解决方案：几何复杂性的许多正规化器

Why neural networks find simple solutions: the many regularizers of geometric complexity

论文作者

Dherin, Benoit, Munn, Michael, Rosca, Mihaela, Barrett, David G. T.

论文摘要

在许多情况下，更简单的模型比更复杂的模型更可取，并且该模型复杂性的控制是机器学习中许多方法的目标，例如正则化，超参数调整和体系结构设计。在深度学习中，很难理解复杂性控制的基本机制，因为许多传统措施并不适合深度神经网络。在这里，我们开发了使用离散的dirichlet能量计算的几何复杂性的概念，这是模型函数变异性的量度。使用理论参数和经验结果的结合，我们表明，许多常见的训练启发式方法，例如参数规范正规化，光谱规范正则化，平坦的正则化，隐式梯度正则化，噪声正则化和参数初始化的选择都可以控制几何学复杂性，从而提供了一个统一的框架，以表征深度学习模型的统一框架。

In many contexts, simpler models are preferable to more complex models and the control of this model complexity is the goal for many methods in machine learning such as regularization, hyperparameter tuning and architecture design. In deep learning, it has been difficult to understand the underlying mechanisms of complexity control, since many traditional measures are not naturally suitable for deep neural networks. Here we develop the notion of geometric complexity, which is a measure of the variability of the model function, computed using a discrete Dirichlet energy. Using a combination of theoretical arguments and empirical results, we show that many common training heuristics such as parameter norm regularization, spectral norm regularization, flatness regularization, implicit gradient regularization, noise regularization and the choice of parameter initialization all act to control geometric complexity, providing a unifying framework in which to characterize the behavior of deep learning models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题