论文标题

laprop:分开亚当的动量和适应性

LaProp: Separating Momentum and Adaptivity in Adam

论文作者

Ziyin, Liu, Wang, Zhikang T., Ueda, Masahito

论文摘要

我们识别亚当式优化器的副局部未认可的问题,这是由于动量和适应性之间不必要的耦合而导致的。当动量和适应性参数不匹配时,耦合会导致不稳定和差异。在这项工作中,我们提出了一种方法,即laprop,它在亚当式方法中取消了动量和适应性。我们表明,脱钩会导致更大的灵活性,并允许在签名的梯度方法和自适应梯度方法之间进行直接插值。我们通过实验表明,在各种任务上,Laprop始终提高了亚当的速度和稳定性。我们还将laprop的遗憾束缚在凸问题上,并表明我们的界限与亚当的界限不同,这证明了其优势。

We identity a by-far-unrecognized problem of Adam-style optimizers which results from unnecessary coupling between momentum and adaptivity. The coupling leads to instability and divergence when the momentum and adaptivity parameters are mismatched. In this work, we propose a method, Laprop, which decouples momentum and adaptivity in the Adam-style methods. We show that the decoupling leads to greater flexibility in the hyperparameters and allows for a straightforward interpolation between the signed gradient methods and the adaptive gradient methods. We experimentally show that Laprop has consistently improved speed and stability over Adam on a variety of tasks. We also bound the regret of Laprop on a convex problem and show that our bound differs from that of Adam by a key factor, which demonstrates its advantage.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源