学习使用肌肉机器人从头开始打乒乓球

论文标题

学习使用肌肉机器人从头开始打乒乓球

Learning to Play Table Tennis From Scratch using Muscular Robots

论文作者

Büchler, Dieter, Guist, Simon, Calandra, Roberto, Berenz, Vincent, Schölkopf, Bernhard, Peters, Jan

论文摘要

对于人类来说，诸如乒乓球之类的动态任务相对容易学习，但对机器人构成了重大挑战。这些任务需要准确控制快速运动和精确的时机，在存在不精确的飞行球和机器人的状态估计的情况下。增强学习（RL）在从数据中学习复杂的控制任务方面表现出了希望。但是，将基于步骤的RL应用于真实系统的动态任务是安全至关重要的，因为RL需要在高速制度中对数百万个时步进行探索和失败。在本文中，我们证明，可以使用由气动人造肌肉（PAMS）驱动的机器人臂来实现使用无模型增强学习的乒乓球学习。 PAM的柔软度和背倾性特性阻止了系统离开其状态空间的安全区域。通过这种方式，RL赋予机器人能够以5 m \ s和12m \ s返回并粉碎真实球，平均至所需的着陆点。我们的设置允许代理商在算法中学习这项安全至关重要的任务（i），而没有安全约束，（ii），同时，使用随机策略（iii）直接在奖励功能（III）中最大化返回的球的速度，该策略直接在真实系统的低级控制上作用于真实系统的低级控制，并且（iv）数以千计的（iv）列车（iv）列车（iv）数以千计的（iv）列车（v）（v）（v），而没有任何先验知识。此外，我们提出了HYSR，这是一种实用的混合模拟和真实的训练，它避免在训练过程中打出真正的球，这是通过在模拟中随机重播记录的球轨迹并将动作应用于真实机器人的训练。这项工作是（a）使用拟人化机器人臂了解安全至关重要的动态任务的第一个工作，（b）尽管受到控制挑战，并且（c）火车机器人可以在没有真实球的情况下打乒乓球，但使用PAM驱动的系统学习一个精确的要求问题。视频和数据集可在musculartt.embodied.ml上找到。

Dynamic tasks like table tennis are relatively easy to learn for humans but pose significant challenges to robots. Such tasks require accurate control of fast movements and precise timing in the presence of imprecise state estimation of the flying ball and the robot. Reinforcement Learning (RL) has shown promise in learning of complex control tasks from data. However, applying step-based RL to dynamic tasks on real systems is safety-critical as RL requires exploring and failing safely for millions of time steps in high-speed regimes. In this paper, we demonstrate that safe learning of table tennis using model-free Reinforcement Learning can be achieved by using robot arms driven by pneumatic artificial muscles (PAMs). Softness and back-drivability properties of PAMs prevent the system from leaving the safe region of its state space. In this manner, RL empowers the robot to return and smash real balls with 5 m\s and 12m\s on average to a desired landing point. Our setup allows the agent to learn this safety-critical task (i) without safety constraints in the algorithm, (ii) while maximizing the speed of returned balls directly in the reward function (iii) using a stochastic policy that acts directly on the low-level controls of the real system and (iv) trains for thousands of trials (v) from scratch without any prior knowledge. Additionally, we present HYSR, a practical hybrid sim and real training that avoids playing real balls during training by randomly replaying recorded ball trajectories in simulation and applying actions to the real robot. This work is the first to (a) fail-safe learn of a safety-critical dynamic task using anthropomorphic robot arms, (b) learn a precision-demanding problem with a PAM-driven system despite the control challenges and (c) train robots to play table tennis without real balls. Videos and datasets are available at muscularTT.embodied.ml.

下载PDF全文

下载文献需遵守相关版权规定

论文标题