工艺：带有时尚融合变压器的摄像头3D对象检测

论文标题

工艺：带有时尚融合变压器的摄像头3D对象检测

CRAFT: Camera-Radar 3D Object Detection with Spatio-Contextual Fusion Transformer

论文作者

Kim, Youngseok, Kim, Sanmin, Choi, Jun Won, Kum, Dongsuk

论文摘要

与LIDAR相比，相机和雷达传感器在成本，可靠性和维护方面具有显着优势。现有的融合方法通常将单个方式的输出融合在结果级别，称为晚期融合策略。这可能会受益于使用现成的单传感器检测算法，但是晚融合无法完全利用传感器的互补特性，因此尽管相机雷达融合的潜力很大，但性能有限。在这里，我们提出了一种新型的建议级早期融合方法，该方法有效利用了相机和雷达的空间和上下文特性，用于3D对象检测。我们的融合框架首先将图像建议与极坐标系中的雷达点相关联，以有效处理坐标系和空间性能之间的差异。将其作为第一阶段，遵循连续的基于交叉注意的特征融合层在相机和雷达之间自适应地交换时尚信息，从而导致稳健而细心的融合。我们的摄像头融合方法可在Nuscenes测试集上获得最新的41.1％MAP，而NDS则达到52.3％的NDS，即比仅摄像机基线高8.7和10.8点，并且在LIDAR方法上产生了竞争性能。

Camera and radar sensors have significant advantages in cost, reliability, and maintenance compared to LiDAR. Existing fusion methods often fuse the outputs of single modalities at the result-level, called the late fusion strategy. This can benefit from using off-the-shelf single sensor detection algorithms, but late fusion cannot fully exploit the complementary properties of sensors, thus having limited performance despite the huge potential of camera-radar fusion. Here we propose a novel proposal-level early fusion approach that effectively exploits both spatial and contextual properties of camera and radar for 3D object detection. Our fusion framework first associates image proposal with radar points in the polar coordinate system to efficiently handle the discrepancy between the coordinate system and spatial properties. Using this as a first stage, following consecutive cross-attention based feature fusion layers adaptively exchange spatio-contextual information between camera and radar, leading to a robust and attentive fusion. Our camera-radar fusion approach achieves the state-of-the-art 41.1% mAP and 52.3% NDS on the nuScenes test set, which is 8.7 and 10.8 points higher than the camera-only baseline, as well as yielding competitive performance on the LiDAR method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题