monograspnet：用单个RGB图像握住6多型

论文标题

monograspnet：用单个RGB图像握住6多型

MonoGraspNet: 6-DoF Grasping with a Single RGB Image

论文作者

Zhai, Guangyao, Huang, Dianye, Wu, Shun-Cheng, Jung, Hyunjun, Di, Yan, Manhardt, Fabian, Tombari, Federico, Navab, Nassir, Busam, Benjamin

论文摘要

6多机器人抓握是一个持久但未解决的问题。最近的方法利用强3D网络从深度传感器中提取几何抓握表示形式，表明对公共物体的准确性卓越，但在光学上具有挑战性的物体（例如透明或反射材料中的对象）上表现不满意。瓶颈是由于光吸收或折射，这些物体的表面无法反射准确的深度。在本文中，与利用不准确的深度数据相反，我们提出了第一个称为MonograspNet的仅RGB的6-DOF抓地管管道，该管道利用稳定的2D特征同时处理任意对象抓握，并克服由光学上具有挑战性挑战的对象引起的问题。 Monograspnet利用关键点热图和正常地图恢复了我们的新型表示形式以相应深度，握把方向，抓握宽度和角度参数为参数的新型表示形式所代表的6-DOF抓握姿势。在真实场景中进行的广泛实验表明，我们的方法可以通过抓住光学挑战的对象的大幅度抓住常见对象并超过基于深度的竞争者的竞争成果。为了进一步刺激机器人的操纵研究，我们还注释并开源一个多视图和多场景真实世界握把数据集，其中包含120个混合光度复杂性的物体，具有20m精确的握把标签。

6-DoF robotic grasping is a long-lasting but unsolved problem. Recent methods utilize strong 3D networks to extract geometric grasping representations from depth sensors, demonstrating superior accuracy on common objects but perform unsatisfactorily on photometrically challenging objects, e.g., objects in transparent or reflective materials. The bottleneck lies in that the surface of these objects can not reflect back accurate depth due to the absorption or refraction of light. In this paper, in contrast to exploiting the inaccurate depth data, we propose the first RGB-only 6-DoF grasping pipeline called MonoGraspNet that utilizes stable 2D features to simultaneously handle arbitrary object grasping and overcome the problems induced by photometrically challenging objects. MonoGraspNet leverages keypoint heatmap and normal map to recover the 6-DoF grasping poses represented by our novel representation parameterized with 2D keypoints with corresponding depth, grasping direction, grasping width, and angle. Extensive experiments in real scenes demonstrate that our method can achieve competitive results in grasping common objects and surpass the depth-based competitor by a large margin in grasping photometrically challenging objects. To further stimulate robotic manipulation research, we additionally annotate and open-source a multi-view and multi-scene real-world grasping dataset, containing 120 objects of mixed photometric complexity with 20M accurate grasping labels.

下载PDF全文

下载文献需遵守相关版权规定

论文标题