论文标题
关于随机深度学习问题的预期经验损失的直接线路搜索方法
A straightforward line search approach on the expected empirical loss for stochastic deep learning problems
论文作者
论文摘要
深度学习的一个基本挑战是,未知的随机梯度下降步骤的最佳步骤大小是未知的。在传统优化中,线搜索用于确定良好的步骤尺寸,但是,在深度学习中,由于嘈杂的损失而导致的预期经验损失,搜索良好的步骤尺寸太高了。这项经验工作表明,对于普通深度学习任务的垂直横截面上的预期经验损失可能相当便宜。这是通过将传统的一维函数拟合应用于此类横截面的嘈杂损失来实现的。然后,将最小近似值的最小值的步骤用作优化的步长。这种方法导致了一种强大而直接的优化方法,该方法在无需高参数调整的情况下跨数据集和体系结构表现良好。
A fundamental challenge in deep learning is that the optimal step sizes for update steps of stochastic gradient descent are unknown. In traditional optimization, line searches are used to determine good step sizes, however, in deep learning, it is too costly to search for good step sizes on the expected empirical loss due to noisy losses. This empirical work shows that it is possible to approximate the expected empirical loss on vertical cross sections for common deep learning tasks considerably cheaply. This is achieved by applying traditional one-dimensional function fitting to measured noisy losses of such cross sections. The step to a minimum of the resulting approximation is then used as step size for the optimization. This approach leads to a robust and straightforward optimization method which performs well across datasets and architectures without the need of hyperparameter tuning.