论文标题
运输网络:重新安排机器人操纵的视觉世界
Transporter Networks: Rearranging the Visual World for Robotic Manipulation
论文作者
论文摘要
机器人的操作可以表述为诱导一系列空间位移:移动的空间可以包含对象,对象的一部分或终端效应器。在这项工作中,我们提出了Transporter Network,这是一种简单的模型体系结构,重新安排深层功能以从视觉输入中推断空间位移 - 可以参数化机器人动作。它没有任何假设(例如,典型的姿势,模型或关键点),它可以利用空间对称性,并且比我们基于学习视觉的操纵任务中的基准替代方案更有效地样品级:从堆叠块的块中,以与不受欢迎的对象组装好的套件;从操纵可变形的绳索到用闭环反馈推动成堆的小物体。我们的方法可以代表复杂的多模式策略分布,并将其推广到多步骤顺序任务以及6DOF选择。对10个模拟任务进行的实验表明,它比各种端到端的基准,包括使用地面对象构成的策略,更快地学习和概括。我们在现实世界中使用硬件来验证我们的方法。实验视频和代码可从https://transporternets.github.io获得。
Robotic manipulation can be formulated as inducing a sequence of spatial displacements: where the space being moved can encompass an object, part of an object, or end effector. In this work, we propose the Transporter Network, a simple model architecture that rearranges deep features to infer spatial displacements from visual input - which can parameterize robot actions. It makes no assumptions of objectness (e.g. canonical poses, models, or keypoints), it exploits spatial symmetries, and is orders of magnitude more sample efficient than our benchmarked alternatives in learning vision-based manipulation tasks: from stacking a pyramid of blocks, to assembling kits with unseen objects; from manipulating deformable ropes, to pushing piles of small objects with closed-loop feedback. Our method can represent complex multi-modal policy distributions and generalizes to multi-step sequential tasks, as well as 6DoF pick-and-place. Experiments on 10 simulated tasks show that it learns faster and generalizes better than a variety of end-to-end baselines, including policies that use ground-truth object poses. We validate our methods with hardware in the real world. Experiment videos and code are available at https://transporternets.github.io