功能广泛的奖励学习：重新思考人类的意见

论文标题

功能广泛的奖励学习：重新思考人类的意见

Feature Expansive Reward Learning: Rethinking Human Input

论文作者

Bobu, Andreea, Wiggert, Marius, Tomlin, Claire, Dragan, Anca D.

论文摘要

当一个人对机器人执行任务的执行方式不满意时，他们可以干预以纠正它。奖励学习方法使机器人能够根据人类的投入在线调整其奖励功能，但它们依靠手工制作的功能。当这些功能无法解释校正时，深层增强学习（IRL）的最新工作表明，机器人可以要求进行任务演示并恢复在原始状态空间上定义的奖励。我们的见解是，机器人不应从演示中隐含地学习缺少的功能，而应要求提供明确教会其缺少的数据。我们介绍了一种新型的人类投入，其中该人指导机器人从所教授的特征的状态高度表达了该机器人，并指向没有的状态。我们提出了一种从原始状态空间中学习功能并将其集成到奖励功能的算法。通过将人类的输入集中在缺失的特征上，我们的方法降低了样本的复杂性，并提高了对上述深层IRL基线的学习奖励的概括。我们在使用物理7DOF机器人操纵器以及在模拟环境中进行的用户研究中进行了实验。

When a person is not satisfied with how a robot performs a task, they can intervene to correct it. Reward learning methods enable the robot to adapt its reward function online based on such human input, but they rely on handcrafted features. When the correction cannot be explained by these features, recent work in deep Inverse Reinforcement Learning (IRL) suggests that the robot could ask for task demonstrations and recover a reward defined over the raw state space. Our insight is that rather than implicitly learning about the missing feature(s) from demonstrations, the robot should instead ask for data that explicitly teaches it about what it is missing. We introduce a new type of human input in which the person guides the robot from states where the feature being taught is highly expressed to states where it is not. We propose an algorithm for learning the feature from the raw state space and integrating it into the reward function. By focusing the human input on the missing feature, our method decreases sample complexity and improves generalization of the learned reward over the above deep IRL baseline. We show this in experiments with a physical 7DOF robot manipulator, as well as in a user study conducted in a simulated environment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题