论文标题
辍学的隐式正规化
Implicit regularization of dropout
论文作者
论文摘要
重要的是要了解一种流行的正则化方法如何有助于在神经网络培训期间实现良好的概括解决方案。在这项工作中,我们提出了隐式正规化的理论推导,该辍学是通过一系列实验验证的。此外,我们在数值上研究了隐式正则化的两个含义,这些含义是直观地使辍学有助于概括的原因合理化的。首先,我们发现,隐藏神经元的输入权重倾向于在用辍学训练的孤立方向上凝结。冷凝是非线性学习过程中的一个功能,这使网络不那么复杂。其次,我们在实验上发现,与标准梯度下降训练相比,辍学的训练会导致最低的神经网络,而隐式正则化是找到平坦溶液的关键。尽管我们的理论主要集中在最后一个隐藏层中使用的辍学,但我们的实验适用于训练神经网络中的一般辍学。这项工作指出了与随机梯度下降相比,辍学的独特特征,是完全理解辍学的重要基础。
It is important to understand how dropout, a popular regularization method, aids in achieving a good generalization solution during neural network training. In this work, we present a theoretical derivation of an implicit regularization of dropout, which is validated by a series of experiments. Additionally, we numerically study two implications of the implicit regularization, which intuitively rationalizes why dropout helps generalization. Firstly, we find that input weights of hidden neurons tend to condense on isolated orientations trained with dropout. Condensation is a feature in the non-linear learning process, which makes the network less complex. Secondly, we experimentally find that the training with dropout leads to the neural network with a flatter minimum compared with standard gradient descent training, and the implicit regularization is the key to finding flat solutions. Although our theory mainly focuses on dropout used in the last hidden layer, our experiments apply to general dropout in training neural networks. This work points out a distinct characteristic of dropout compared with stochastic gradient descent and serves as an important basis for fully understanding dropout.