通过灵活的政策迭代在循环中对机器人膝盖对机器人膝盖的增强学习控制

论文标题

通过灵活的政策迭代在循环中对机器人膝盖对机器人膝盖的增强学习控制

Reinforcement Learning Control of Robotic Knee with Human in the Loop by Flexible Policy Iteration

论文作者

Gao, Xiang, Si, Jennie, Wen, Yue, Li, Minhan, He, Huang

论文摘要

我们受到人类机器人系统中提出的真正挑战的动机，以开发在数据级别上有效的新设计以及在系统级别上稳定性和最佳性等性能保证。从理论上考虑系统性能的现有近似/自适应动态编程（ADP）结果并不容易为此问题提供实际有用的学习控制算法；解决数据效率问题的增强学习（RL）算法通常没有受控系统的性能保证。这项研究通过将创新的特征引入政策迭代算法来填补这些重要的空隙。我们介绍了灵活的政策迭代（FPI），可以灵活，有机地将经验重播和补充价值从先前的经验重新集成到RL控制器中。我们显示了系统级别的性能，包括近似值函数的收敛性，解决方案的最优性以及系统的稳定性。我们通过对人类机器人系统的现实模拟来证明FPI的有效性。值得注意的是，我们在这项研究中面临的问题可能很难通过基于经典控制理论的设计方法来解决，因为几乎不可能在线或离线获得人类机器人系统的定制数学模型。我们获得的结果还表明，RL控制在解决高维控制输入的现实和挑战性问题方面具有巨大的潜力。

We are motivated by the real challenges presented in a human-robot system to develop new designs that are efficient at data level and with performance guarantees such as stability and optimality at systems level. Existing approximate/adaptive dynamic programming (ADP) results that consider system performance theoretically are not readily providing practically useful learning control algorithms for this problem; and reinforcement learning (RL) algorithms that address the issue of data efficiency usually do not have performance guarantees for the controlled system. This study fills these important voids by introducing innovative features to the policy iteration algorithm. We introduce flexible policy iteration (FPI), which can flexibly and organically integrate experience replay and supplemental values from prior experience into the RL controller. We show system level performances including convergence of the approximate value function, (sub)optimality of the solution, and stability of the system. We demonstrate the effectiveness of the FPI via realistic simulations of the human-robot system. It is noted that the problem we face in this study may be difficult to address by design methods based on classical control theory as it is nearly impossible to obtain a customized mathematical model of a human-robot system either online or offline. The results we have obtained also indicate the great potential of RL control to solving realistic and challenging problems with high dimensional control inputs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题