推断平滑控制：蒙特卡洛后政策与高斯流程

论文标题

推断平滑控制：蒙特卡洛后政策与高斯流程

Inferring Smooth Control: Monte Carlo Posterior Policy Iteration with Gaussian Processes

论文作者

Watson, Joe, Peters, Jan

论文摘要

蒙特卡洛方法与控制非差异系统，近似动力学模型和从数据学习的控制变得越来越重要。这些方法扩展到高维空间，并且在机器人学习中经常看到的非凸优化方面有效。我们从基于推理的控制的角度，特别是后策略迭代的角度来研究基于样本的方法。从这个角度来看，我们强调了高斯噪声先验如何产生不适合物理机器人部署的粗略控制动作。考虑到情节增强学习和运动计划中使用的更平滑的高斯工艺先验，我们证明了如何使用在线顺序推断如何实现模型预测性控制。通过有效分解动作分布和优化可能性温度以提高重要性抽样精度的新颖手段来实现这种推论。我们在几个高维机器人控制任务上评估了这种方法，以匹配先前的启发式方法的样本效率，同时还可以确保平滑度。可以在https://monte-carlo-ppi.github.io/上看到仿真结果。

Monte Carlo methods have become increasingly relevant for control of non-differentiable systems, approximate dynamics models and learning from data. These methods scale to high-dimensional spaces and are effective at the non-convex optimizations often seen in robot learning. We look at sample-based methods from the perspective of inference-based control, specifically posterior policy iteration. From this perspective, we highlight how Gaussian noise priors produce rough control actions that are unsuitable for physical robot deployment. Considering smoother Gaussian process priors, as used in episodic reinforcement learning and motion planning, we demonstrate how smoother model predictive control can be achieved using online sequential inference. This inference is realized through an efficient factorization of the action distribution and a novel means of optimizing the likelihood temperature to improve importance sampling accuracy. We evaluate this approach on several high-dimensional robot control tasks, matching the sample efficiency of prior heuristic methods while also ensuring smoothness. Simulation results can be seen at https://monte-carlo-ppi.github.io/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题