对现实世界机器人增强学习的反事实预测的离线学习

论文标题

对现实世界机器人增强学习的反事实预测的离线学习

Offline Learning of Counterfactual Predictions for Real-World Robotic Reinforcement Learning

论文作者

Jin, Jun, Graves, Daniel, Haigh, Cameron, Luo, Jun, Jagersand, Martin

论文摘要

我们考虑了涉及视觉运动技能和接触技巧的机器人操纵任务的现实增强学习（RL）。我们旨在训练一项政策，将多模式感官观察（视觉和力量）映射到实际考虑的操纵者的联合速度。我们建议使用离线样本学习一组通用价值函数（GVF），从而从视觉输入中进行反事实预测。我们表明，将离线学习的反事实预测与在线政策学习中的力量反馈相结合，只有在终端（成功/失败）奖励的情况下，有效的强化学习。我们认为，学识渊博的反事实预测形成了一个紧凑而有益的表示，可以实现样本效率，并提供辅助奖励信号，以指导在线探索接触率丰富的状态。进行了模拟和现实世界中的各种实验以进行评估。可以通过https://sites.google.com/view/realrl找到现实世界机器人培训的录音。

We consider real-world reinforcement learning (RL) of robotic manipulation tasks that involve both visuomotor skills and contact-rich skills. We aim to train a policy that maps multimodal sensory observations (vision and force) to a manipulator's joint velocities under practical considerations. We propose to use offline samples to learn a set of general value functions (GVFs) that make counterfactual predictions from the visual inputs. We show that combining the offline learned counterfactual predictions with force feedbacks in online policy learning allows efficient reinforcement learning given only a terminal (success/failure) reward. We argue that the learned counterfactual predictions form a compact and informative representation that enables sample efficiency and provides auxiliary reward signals that guide online explorations towards contact-rich states. Various experiments in simulation and real-world settings were performed for evaluation. Recordings of the real-world robot training can be found via https://sites.google.com/view/realrl.

下载PDF全文

下载文献需遵守相关版权规定

论文标题