论文标题
深入加固学习的可扩展有限差异方法
A Scalable Finite Difference Method for Deep Reinforcement Learning
论文作者
论文摘要
最近已经证明,几种有限差异(例如进化策略)中的几种低型带宽可分布的黑盒优化算法在某些强化学习域中几乎和量身定制的增强学习方法一样。这些黑框方法的一个缺点是,他们必须在每个更新中收集有关返回功能结构的信息,并且通常只能从围绕当前参数的分发中绘制的信息。结果,当这些算法分布在许多机器上时,可能会在许多机器空闲的情况下花费总运行时的很大一部分,等待最终返回,然后计算更新。在这项工作中,我们介绍了一种新颖的方法,可以在有限差算法中使用较旧的数据,该算法产生了可扩展的算法,该算法避免了大量的空闲时间或浪费的计算。
Several low-bandwidth distributable black-box optimization algorithms in the family of finite differences such as Evolution Strategies have recently been shown to perform nearly as well as tailored Reinforcement Learning methods in some Reinforcement Learning domains. One shortcoming of these black-box methods is that they must collect information about the structure of the return function at every update, and can often employ only information drawn from a distribution centered around the current parameters. As a result, when these algorithms are distributed across many machines, a significant portion of total runtime may be spent with many machines idle, waiting for a final return and then for an update to be calculated. In this work we introduce a novel method to use older data in finite difference algorithms, which produces a scalable algorithm that avoids significant idle time or wasted computation.