duqim-net：多视图操纵的概率对象层次结构表示

论文标题

duqim-net：多视图操纵的概率对象层次结构表示

DUQIM-Net: Probabilistic Object Hierarchy Representation for Multi-View Manipulation

论文作者

Tchuiev, Vladimir, Miron, Yakov, Di-Castro, Dotan

论文摘要

混乱场景中的物体操纵是机器人技术中的一个困难和重要问题。为了有效地操纵物体，重要的是要了解它们的周围环境，尤其是在将一个物体堆叠在另一个物体上的情况下，以防止有效抓握。我们在这里提出Duqim-Net，这是一种在堆叠对象的设置中进行对象操纵的决策方法。在DUQIM-NET中，使用ADJ-NET评估层次堆叠关系，该模型通过添加邻接头来利用现有的变压器编码器编码器对象检测器。该头部的输出概率地渗透了场景中对象的基础层次结构。我们利用DUQIM-NET中的邻接矩阵的属性来执行决策并协助对象抓任务。我们的实验结果表明，ADJ-NET超过了视觉操作关系数据集（VMRD）的对象关系推断的最新技术，并且DUQIM-NET在bin清除任务中的表现优于可比的方法。

Object manipulation in cluttered scenes is a difficult and important problem in robotics. To efficiently manipulate objects, it is crucial to understand their surroundings, especially in cases where multiple objects are stacked one on top of the other, preventing effective grasping. We here present DUQIM-Net, a decision-making approach for object manipulation in a setting of stacked objects. In DUQIM-Net, the hierarchical stacking relationship is assessed using Adj-Net, a model that leverages existing Transformer Encoder-Decoder object detectors by adding an adjacency head. The output of this head probabilistically infers the underlying hierarchical structure of the objects in the scene. We utilize the properties of the adjacency matrix in DUQIM-Net to perform decision making and assist with object-grasping tasks. Our experimental results show that Adj-Net surpasses the state-of-the-art in object-relationship inference on the Visual Manipulation Relationship Dataset (VMRD), and that DUQIM-Net outperforms comparable approaches in bin clearing tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题