通过戳的深度：学会从自我监督的抓握中估算深度

论文标题

通过戳的深度：学会从自我监督的抓握中估算深度

Depth by Poking: Learning to Estimate Depth from Self-Supervised Grasping

论文作者

Goodrich, Ben, Kuefler, Alex, Richards, William D.

论文摘要

准确的深度估计仍然是机器人操作的一个空旷的问题。甚至包括结构性光和激光雷达传感器在内的最先进的技术也无法在反射性或透明的表面上失败。我们通过训练神经网络模型来解决此问题，以使用机器人及其环境之间的物理相互作用的标签来估算RGB-D图像的深度。我们的网络可以预测，对于输入图像中的每个像素，机器人的最终效应器试图掌握或戳在相应位置时将达到的Z位置。鉴于一个自主掌握政策，我们的方法是自我监督的，因为可以通过前向运动学恢复末端效应的位置标签，而无需人类注释。尽管收集这种物理互动数据是昂贵的，但对于最先进的操纵系统的培训和常规操作是必要的。因此，该深度估计器在收集其他任务的数据时（例如，抓握，推动，放置）来``免费''。我们显示我们的方法比传统的结构光传感器和无监督的深度学习方法的根平方误差明显低于均方根。

Accurate depth estimation remains an open problem for robotic manipulation; even state of the art techniques including structured light and LiDAR sensors fail on reflective or transparent surfaces. We address this problem by training a neural network model to estimate depth from RGB-D images, using labels from physical interactions between a robot and its environment. Our network predicts, for each pixel in an input image, the z position that a robot's end effector would reach if it attempted to grasp or poke at the corresponding position. Given an autonomous grasping policy, our approach is self-supervised as end effector position labels can be recovered through forward kinematics, without human annotation. Although gathering such physical interaction data is expensive, it is necessary for training and routine operation of state of the art manipulation systems. Therefore, this depth estimator comes ``for free'' while collecting data for other tasks (e.g., grasping, pushing, placing). We show our approach achieves significantly lower root mean squared error than traditional structured light sensors and unsupervised deep learning methods on difficult, industry-scale jumbled bin datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题