培训单个与双精度的神经网络

论文标题

培训单个与双精度的神经网络

Training Neural Networks in Single vs Double Precision

论文作者

Hrycej, Tomas, Bermeitinger, Bernhard, Handschuh, Siegfried

论文摘要

在深度学习社区中，对单精度浮点算术的承诺是广泛的。为了评估这一承诺是否合理，已经研究了计算精度（单个和双重精度）对共轭梯度（CG）方法（二阶优化算法）和RMSProp（一阶算法）优化性能的影响。具有一到五个完全连接的隐藏层以及中等或强的非线性的神经网络的测试已针对均方误差（MSE）进行了优化。已经设置了培训任务，以使其最低限度为零。计算实验已经揭示了只要线路搜索找到改进，单精度就可以以双重精确的速度来保持（超级线性收敛）。诸如RMSPROP之类的一阶方法不会受益于双重精度。但是，对于中等非线性任务，CG显然是优越的。对于强烈的非线性任务，这两个算法类别仅在与输出方差相关的均方根误差方面发现解决方案相当差。每当解决方案有可能对应用程序目标有用时，具有双浮点精度的CG都会出色。

The commitment to single-precision floating-point arithmetic is widespread in the deep learning community. To evaluate whether this commitment is justified, the influence of computing precision (single and double precision) on the optimization performance of the Conjugate Gradient (CG) method (a second-order optimization algorithm) and RMSprop (a first-order algorithm) has been investigated. Tests of neural networks with one to five fully connected hidden layers and moderate or strong nonlinearity with up to 4 million network parameters have been optimized for Mean Square Error (MSE). The training tasks have been set up so that their MSE minimum was known to be zero. Computing experiments have disclosed that single-precision can keep up (with superlinear convergence) with double-precision as long as line search finds an improvement. First-order methods such as RMSprop do not benefit from double precision. However, for moderately nonlinear tasks, CG is clearly superior. For strongly nonlinear tasks, both algorithm classes find only solutions fairly poor in terms of mean square error as related to the output variance. CG with double floating-point precision is superior whenever the solutions have the potential to be useful for the application goal.

下载PDF全文

下载文献需遵守相关版权规定

论文标题