CAT-DET：对比增强多模式3D对象检测的增强变压器

论文标题

CAT-DET：对比增强多模式3D对象检测的增强变压器

CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection

论文作者

Zhang, Yanan, Chen, Jiaxin, Huang, Di

论文摘要

在自动驾驶中，LiDar点云和RGB图像是两个主要数据模式，具有3D对象检测的互补提示。但是，由于模式间差异很大，很难充分使用它们。为了解决这个问题，我们提出了一个新颖的框架，即用于多模式3D对象检测（CAT-DET）的增强变压器。具体而言，CAT-DET采用了由点形式（PT）分支组成的两流结构，即ImageFormer（IT）分支以及跨模式变压器（CMT）模块。 PT，IT和CMT共同编码用于表示对象的模式内和模式间远程上下文，从而充分探索多模式信息以进行检测。此外，我们提出了一种有效的单向多模式数据增强（OMDA）方法，可以在点和对象水平上进行层次对比度学习，仅通过增强点云来显着提高准确性，这可以不受两种模态的复杂生成的配对样品的复杂产生。 Kitti基准测试的广泛实验表明，CAT-DET实现了新的最先进，突出了其有效性。

In autonomous driving, LiDAR point-clouds and RGB images are two major data modalities with complementary cues for 3D object detection. However, it is quite difficult to sufficiently use them, due to large inter-modal discrepancies. To address this issue, we propose a novel framework, namely Contrastively Augmented Transformer for multi-modal 3D object Detection (CAT-Det). Specifically, CAT-Det adopts a two-stream structure consisting of a Pointformer (PT) branch, an Imageformer (IT) branch along with a Cross-Modal Transformer (CMT) module. PT, IT and CMT jointly encode intra-modal and inter-modal long-range contexts for representing an object, thus fully exploring multi-modal information for detection. Furthermore, we propose an effective One-way Multi-modal Data Augmentation (OMDA) approach via hierarchical contrastive learning at both the point and object levels, significantly improving the accuracy only by augmenting point-clouds, which is free from complex generation of paired samples of the two modalities. Extensive experiments on the KITTI benchmark show that CAT-Det achieves a new state-of-the-art, highlighting its effectiveness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题