论文标题

CAT-DET:对比增强多模式3D对象检测的增强变压器

CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection

论文作者

Zhang, Yanan, Chen, Jiaxin, Huang, Di

论文摘要

在自动驾驶中,LiDar点云和RGB图像是两个主要数据模式,具有3D对象检测的互补提示。但是,由于模式间差异很大,很难充分使用它们。为了解决这个问题,我们提出了一个新颖的框架,即用于多模式3D对象检测(CAT-DET)的增强变压器。具体而言,CAT-DET采用了由点形式(PT)分支组成的两流结构,即ImageFormer(IT)分支以及跨模式变压器(CMT)模块。 PT,IT和CMT共同编码用于表示对象的模式内和模式间远程上下文,从而充分探索多模式信息以进行检测。此外,我们提出了一种有效的单向多模式数据增强(OMDA)方法,可以在点和对象水平上进行层次对比度学习,仅通过增强点云来显着提高准确性,这可以不受两种模态的复杂生成的配对样品的复杂产生。 Kitti基准测试的广泛实验表明,CAT-DET实现了新的最先进,突出了其有效性。

In autonomous driving, LiDAR point-clouds and RGB images are two major data modalities with complementary cues for 3D object detection. However, it is quite difficult to sufficiently use them, due to large inter-modal discrepancies. To address this issue, we propose a novel framework, namely Contrastively Augmented Transformer for multi-modal 3D object Detection (CAT-Det). Specifically, CAT-Det adopts a two-stream structure consisting of a Pointformer (PT) branch, an Imageformer (IT) branch along with a Cross-Modal Transformer (CMT) module. PT, IT and CMT jointly encode intra-modal and inter-modal long-range contexts for representing an object, thus fully exploring multi-modal information for detection. Furthermore, we propose an effective One-way Multi-modal Data Augmentation (OMDA) approach via hierarchical contrastive learning at both the point and object levels, significantly improving the accuracy only by augmenting point-clouds, which is free from complex generation of paired samples of the two modalities. Extensive experiments on the KITTI benchmark show that CAT-Det achieves a new state-of-the-art, highlighting its effectiveness.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源