论文标题
Adasmooth:一种基于有效比率的自适应学习率方法
AdaSmooth: An Adaptive Learning Rate Method based on Effective Ratio
论文作者
论文摘要
众所周知,我们需要在动量,Adagrad,Adadelta和其他替代随机优化器中选择超参数。尽管在许多情况下,超参数是根据经验而不是科学而不是科学的经验来调整的。我们提出了一种称为Adasmooth的梯度下降的新型人均学习率方法。该方法对超参数不敏感,因此不需要对动量,Adagrad和Adadelta方法等高参数进行手动调整。与不同的卷积神经网络,多层感知器和替代机器学习任务的其他方法相比,我们显示出令人鼓舞的结果。经验结果表明,Adasmooth在实践中效果很好,并与神经网络中的其他随机优化方法进行了比较。
It is well known that we need to choose the hyper-parameters in Momentum, AdaGrad, AdaDelta, and other alternative stochastic optimizers. While in many cases, the hyper-parameters are tuned tediously based on experience becoming more of an art than science. We present a novel per-dimension learning rate method for gradient descent called AdaSmooth. The method is insensitive to hyper-parameters thus it requires no manual tuning of the hyper-parameters like Momentum, AdaGrad, and AdaDelta methods. We show promising results compared to other methods on different convolutional neural networks, multi-layer perceptron, and alternative machine learning tasks. Empirical results demonstrate that AdaSmooth works well in practice and compares favorably to other stochastic optimization methods in neural networks.