实用的准Newton方法用于培训深神经网络

论文标题

实用的准Newton方法用于培训深神经网络

Practical Quasi-Newton Methods for Training Deep Neural Networks

论文作者

Goldfarb, Donald, Ren, Yi, Bahamou, Achraf

论文摘要

我们考虑了实用随机准牛顿的发展，特别是Kronecker因块 - 二基因BFG和L-BFGS方法的发展，用于培训深神经网络（DNNS）。在DNN培训中，梯度$ n $的变量和组件的数量通常是数千万的订单，而Hessian的$ n^2 $元素。因此，计算和存储完整的$ n \ times n $ bfg近似或存储一个适度的（步骤，更改梯度）向量对，用于L-BFGS实现，这是不可能的。在我们提出的方法中，我们通过一个块对基因矩阵近似于Hessian，并使用梯度和Hessian的结构进一步近似这些块，每个块都与一个层相对应，作为两个较小矩阵的kronecker产物。这类似于KFAC中的方法，KFAC在随机天然梯度方法中计算了针对Fisher基质的Kronecker构成块对基近似。由于Hessian在DNN中的无限性和高度可变性质，因此我们还提出了一种新的阻尼方法，以保持BFGS和L-BFGS近似值的上限以及下限。在对具有九个或13层应用于三个数据集的自动编码器前馈神经网络模型的测试中，我们的方法优于KFAC和最先进的一阶随机方法。

We consider the development of practical stochastic quasi-Newton, and in particular Kronecker-factored block-diagonal BFGS and L-BFGS methods, for training deep neural networks (DNNs). In DNN training, the number of variables and components of the gradient $n$ is often of the order of tens of millions and the Hessian has $n^2$ elements. Consequently, computing and storing a full $n \times n$ BFGS approximation or storing a modest number of (step, change in gradient) vector pairs for use in an L-BFGS implementation is out of the question. In our proposed methods, we approximate the Hessian by a block-diagonal matrix and use the structure of the gradient and Hessian to further approximate these blocks, each of which corresponds to a layer, as the Kronecker product of two much smaller matrices. This is analogous to the approach in KFAC, which computes a Kronecker-factored block-diagonal approximation to the Fisher matrix in a stochastic natural gradient method. Because the indefinite and highly variable nature of the Hessian in a DNN, we also propose a new damping approach to keep the upper as well as the lower bounds of the BFGS and L-BFGS approximations bounded. In tests on autoencoder feed-forward neural network models with either nine or thirteen layers applied to three datasets, our methods outperformed or performed comparably to KFAC and state-of-the-art first-order stochastic methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题