基于视觉的操纵器还需要从他们的手中看到

论文标题

基于视觉的操纵器还需要从他们的手中看到

Vision-Based Manipulators Need to Also See from Their Hands

论文作者

Hsu, Kyle, Kim, Moo Jin, Rafailov, Rafael, Wu, Jiajun, Finn, Chelsea

论文摘要

我们研究视觉视角的选择如何在原始传感器观察中进行物理操纵的背景下影响学习和概括。与更常用的全球第三人称视角相比，以手动为中心的（手持）的观点可降低可观察性，但我们发现它始终提高训练效率和分布外的概括。这些好处在各种学习算法，实验环境和分配变化以及模拟和真实机器人设备中都具有。但是，只有当以手动观察性足够的情况下才是这种情况。否则，包括第三人称视角对于学习是必要的，但也会损害分布之外的概括。为了减轻这种情况，我们建议通过各种信息瓶颈正规化第三人称信息流。在六个代表性的操纵任务中，根据元世界基准进行了不同的以手动观察性的不同，这导致了从这两个角度运作的最先进的强化学习代理，从而改善了对每个任务的分布概括。尽管一些从业人员长期以来一直将相机掌握在机器人手中，但我们的工作系统地分析了这样做的好处，并提供了简单且广泛适用的见解，以改善端到端学习的基于视觉的机器人操作。

We study how the choice of visual perspective affects learning and generalization in the context of physical manipulation from raw sensor observations. Compared with the more commonly used global third-person perspective, a hand-centric (eye-in-hand) perspective affords reduced observability, but we find that it consistently improves training efficiency and out-of-distribution generalization. These benefits hold across a variety of learning algorithms, experimental settings, and distribution shifts, and for both simulated and real robot apparatuses. However, this is only the case when hand-centric observability is sufficient; otherwise, including a third-person perspective is necessary for learning, but also harms out-of-distribution generalization. To mitigate this, we propose to regularize the third-person information stream via a variational information bottleneck. On six representative manipulation tasks with varying hand-centric observability adapted from the Meta-World benchmark, this results in a state-of-the-art reinforcement learning agent operating from both perspectives improving its out-of-distribution generalization on every task. While some practitioners have long put cameras in the hands of robots, our work systematically analyzes the benefits of doing so and provides simple and broadly applicable insights for improving end-to-end learned vision-based robotic manipulation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题