立体倒闭：类别级别的6D透明对象姿势姿势估计通过后视NOCS从立体声图像估算

论文标题

立体倒闭：类别级别的6D透明对象姿势姿势估计通过后视NOCS从立体声图像估算

StereoPose: Category-Level 6D Transparent Object Pose Estimation from Stereo Images via Back-View NOCS

论文作者

Chen, Kai, James, Stephen, Sui, Congying, Liu, Yun-Hui, Abbeel, Pieter, Dou, Qi

论文摘要

类别级别姿势估计的大多数现有方法都取决于对象点云。但是，在考虑透明物体时，深度摄像机通常无法捕获有意义的数据，从而导致造影量严重的点云。没有高质量的点云，现有方法不适用于具有挑战性的透明对象。为了解决这个问题，我们提出了立体声，这是一种用于类别级对象姿势估计的新颖的立体声图像框架，非常适合透明对象。为了从纯立体声图像中进行强大的估计，我们开发了一条管道，该管道将类别级别的姿势估计分解为对象大小估计，初始姿势估计和姿势细化。然后，立体固定基于归一化对象坐标空间〜（NOC）中的表示，估计对象姿势。为了解决图像内容混叠的问题，我们进一步定义了透明对象的背景NOCS映射。背景NOC旨在减少因内容混叠引起的网络学习歧义，并利用透明对象背面的信息提示，以进行更准确的姿势估计。为了进一步提高立体声框架的性能，立体底胶配备了视差注意模块，用于立体声特征融合，并配备了相邻损失，以提高网络预测的立体视图一致性。公众TOD数据集进行了广泛的实验，证明了对类别级别6D透明对象姿势估计的拟议立体框架框架的优越性。

Most existing methods for category-level pose estimation rely on object point clouds. However, when considering transparent objects, depth cameras are usually not able to capture meaningful data, resulting in point clouds with severe artifacts. Without a high-quality point cloud, existing methods are not applicable to challenging transparent objects. To tackle this problem, we present StereoPose, a novel stereo image framework for category-level object pose estimation, ideally suited for transparent objects. For a robust estimation from pure stereo images, we develop a pipeline that decouples category-level pose estimation into object size estimation, initial pose estimation, and pose refinement. StereoPose then estimates object pose based on representation in the normalized object coordinate space~(NOCS). To address the issue of image content aliasing, we further define a back-view NOCS map for the transparent object. The back-view NOCS aims to reduce the network learning ambiguity caused by content aliasing, and leverage informative cues on the back of the transparent object for more accurate pose estimation. To further improve the performance of the stereo framework, StereoPose is equipped with a parallax attention module for stereo feature fusion and an epipolar loss for improving the stereo-view consistency of network predictions. Extensive experiments on the public TOD dataset demonstrate the superiority of the proposed StereoPose framework for category-level 6D transparent object pose estimation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题