按二阶自动差异对学习率的自适应缩放率

论文标题

按二阶自动差异对学习率的自适应缩放率

Adaptive scaling of the learning rate by second order automatic differentiation

论文作者

de Gournay, Frédéric, Gossard, Alban

论文摘要

在优化深神经网络的背景下，我们建议使用一种新的自动分化技术来重新学习学习率。该技术依赖于{\ em曲率}的计算，这是一个二阶信息，其计算复杂性在梯度的计算与Hessian-Vector产品之一之间。如果（1C，1M）分别代表梯度方法的计算时间和内存足迹，则新技术将总成本增加到（1.5C，2M）或（2C，1M）。这种恢复具有具有自然解释的吸引力的特征，它使从业者可以在探索算法的参数集和融合之间进行选择。恢复是自适应的，它取决于数据和下降方向。数值实验突出了不同的探索/收敛制度。

In the context of the optimization of Deep Neural Networks, we propose to rescale the learning rate using a new technique of automatic differentiation. This technique relies on the computation of the {\em curvature}, a second order information whose computational complexity is in between the computation of the gradient and the one of the Hessian-vector product. If (1C,1M) represents respectively the computational time and memory footprint of the gradient method, the new technique increase the overall cost to either (1.5C,2M) or (2C,1M). This rescaling has the appealing characteristic of having a natural interpretation, it allows the practitioner to choose between exploration of the parameters set and convergence of the algorithm. The rescaling is adaptive, it depends on the data and on the direction of descent. The numerical experiments highlight the different exploration/convergence regimes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题