论文标题
配置路径控制
Configuration Path Control
论文作者
论文摘要
加强学习方法通常会制定脆弱的政策 - 在培训期间表现良好但超出其直接培训经验的政策,因此在小小的干扰下变得不稳定。为了解决此问题,我们提出了一种在配置路径空间中稳定控制策略的方法。它是训练后应用的,纯粹依赖于训练过程中产生的数据以及瞬时控制矩阵估计。该方法对经过多种扰动的平面助行器进行经验评估。将通过加强学习获得的控制策略与稳定的对应物进行了比较。在不同的实验中,当根据扰动幅度测量时,我们发现稳定性增加了两到四倍。我们还对我们的方法提供了零动力学的解释。
Reinforcement learning methods often produce brittle policies -- policies that perform well during training, but generalize poorly beyond their direct training experience, thus becoming unstable under small disturbances. To address this issue, we propose a method for stabilizing a control policy in the space of configuration paths. It is applied post-training and relies purely on the data produced during training, as well as on an instantaneous control-matrix estimation. The approach is evaluated empirically on a planar bipedal walker subjected to a variety of perturbations. The control policies obtained via reinforcement learning are compared against their stabilized counterparts. Across different experiments, we find two- to four-fold increase in stability, when measured in terms of the perturbation amplitudes. We also provide a zero-dynamics interpretation of our approach.