论文标题
自适应平滑路径积分控制
Adaptive Smoothing Path Integral Control
论文作者
论文摘要
在路径积分控制问题中,可以正式计算出最佳控制动力系统的表示形式,并用作学习参数化策略的指南。路径积分横向透镜(PICE)方法试图利用它,但由于样本效率较差而受到阻碍。我们提出了一种称为ASPIC(路径积分控制的自适应平滑)的无模型算法,该算法将INF卷积应用于成本函数,以加快策略优化的融合。我们将PICE识别为这种技术的无限平滑极限,并表明PICE遭受的样本效率问题消失了有限的平滑水平。对于零平滑,此方法成为成本的贪婪优化,这是当前强化学习的标准方法。我们在分析和经验上表明,中等水平的平滑水平是最佳的,这使得新方法优于PICE和直接成本优化。
In Path Integral control problems a representation of an optimally controlled dynamical system can be formally computed and serve as a guidepost to learn a parametrized policy. The Path Integral Cross-Entropy (PICE) method tries to exploit this, but is hampered by poor sample efficiency. We propose a model-free algorithm called ASPIC (Adaptive Smoothing of Path Integral Control) that applies an inf-convolution to the cost function to speedup convergence of policy optimization. We identify PICE as the infinite smoothing limit of such technique and show that the sample efficiency problems that PICE suffers disappear for finite levels of smoothing. For zero smoothing this method becomes a greedy optimization of the cost, which is the standard approach in current reinforcement learning. We show analytically and empirically that intermediate levels of smoothing are optimal, which renders the new method superior to both PICE and direct cost-optimization.