从示范，校正和偏好中的统一学习在物理人类机器人互动期间

论文标题

从示范，校正和偏好中的统一学习在物理人类机器人互动期间

Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human-Robot Interaction

论文作者

Mehta, Shaunak A., Losey, Dylan P.

论文摘要

人类可以利用身体互动来教机器人武器。这种物理互动取决于任务，用户以及机器人到目前为止所学的内容。最先进的方法专注于从单一模态学习，或者假设机器人具有有关人类预期任务的先前信息来结合多个互动类型。相比之下，在本文中，我们介绍了一种算法形式主义，该算法从演示，更正和偏好中统一学习。我们的方法对人类想要教机器人的任务没有任何假设。取而代之的是，我们通过将人类的输入与附近的替代方案进行比较，从头开始学习奖励模型。我们首先得出损失函数，该功能训练奖励模型的集合，以匹配人类的示范，更正和偏好。反馈的类型和顺序取决于人类老师：我们使机器人能够被动地或积极地收集此反馈。然后，我们应用受约束的优化将我们学习的奖励转换为所需的机器人轨迹。通过模拟和用户研究，我们证明，与现有基线相比，我们提出的方法更准确地从物理人类互动中学习了操纵任务，尤其是当机器人面临新的或意外的目标时。我们的用户研究视频可在以下网址找到：https：//youtu.be/fsujstyveku

Humans can leverage physical interaction to teach robot arms. This physical interaction takes multiple forms depending on the task, the user, and what the robot has learned so far. State-of-the-art approaches focus on learning from a single modality, or combine multiple interaction types by assuming that the robot has prior information about the human's intended task. By contrast, in this paper we introduce an algorithmic formalism that unites learning from demonstrations, corrections, and preferences. Our approach makes no assumptions about the tasks the human wants to teach the robot; instead, we learn a reward model from scratch by comparing the human's inputs to nearby alternatives. We first derive a loss function that trains an ensemble of reward models to match the human's demonstrations, corrections, and preferences. The type and order of feedback is up to the human teacher: we enable the robot to collect this feedback passively or actively. We then apply constrained optimization to convert our learned reward into a desired robot trajectory. Through simulations and a user study we demonstrate that our proposed approach more accurately learns manipulation tasks from physical human interaction than existing baselines, particularly when the robot is faced with new or unexpected objectives. Videos of our user study are available at: https://youtu.be/FSUJsTYvEKU

下载PDF全文

下载文献需遵守相关版权规定

论文标题