痕量限制的kronecker对天然梯度的近似

论文标题

痕量限制的kronecker对天然梯度的近似

A Trace-restricted Kronecker-Factored Approximation to Natural Gradient

论文作者

Gao, Kai-Xin, Liu, Xiao-Lei, Huang, Zheng-Hai, Wang, Min, Wang, Zidong, Xu, Dachuan, Yu, Fan

论文摘要

二阶优化方法具有通过曲率矩阵修改梯度来加速收敛的能力。已经有许多尝试使用二阶优化方法来训练深层神经网络。 Inspired by diagonal approximations and factored approximations such as Kronecker-Factored Approximate Curvature (KFAC), we propose a new approximation to the Fisher information matrix (FIM) called Trace-restricted Kronecker-factored Approximate Curvature (TKFAC) in this work, which can hold the certain trace relationship between the exact and the approximate FIM.在TKFAC中，我们将近似FIM的每个块分解为两个较小矩阵的kronecker乘积，并通过与痕迹相关的系数缩放。我们理论上分析了TKFAC的近似误差，并给出了它的上限。我们还为卷积神经网络上的TKFAC提出了一种新的阻尼技术，以维持训练过程中二阶优化方法的优越性。实验表明，与某些深层网络体系结构上的几种最新算法相比，我们的方法具有更好的性能。

Second-order optimization methods have the ability to accelerate convergence by modifying the gradient through the curvature matrix. There have been many attempts to use second-order optimization methods for training deep neural networks. Inspired by diagonal approximations and factored approximations such as Kronecker-Factored Approximate Curvature (KFAC), we propose a new approximation to the Fisher information matrix (FIM) called Trace-restricted Kronecker-factored Approximate Curvature (TKFAC) in this work, which can hold the certain trace relationship between the exact and the approximate FIM. In TKFAC, we decompose each block of the approximate FIM as a Kronecker product of two smaller matrices and scaled by a coefficient related to trace. We theoretically analyze TKFAC's approximation error and give an upper bound of it. We also propose a new damping technique for TKFAC on convolutional neural networks to maintain the superiority of second-order optimization methods during training. Experiments show that our method has better performance compared with several state-of-the-art algorithms on some deep network architectures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题