探索多模式的数据增强3D对象检测

论文标题

探索多模式的数据增强3D对象检测

Exploring Data Augmentation for Multi-Modality 3D Object Detection

论文作者

Zhang, Wenwei, Wang, Zhe, Loy, Chen Change

论文摘要

违反直觉的是，基于点云的多模式方法和图像的性能比仅使用点云的方法略微更好或有时更糟。本文研究了这种现象背后的原因。由于多模式数据增强必须保持点云和图像之间的一致性，因此该字段中的最新方法通常使用相对不足的数据增强。这种短缺使他们的表现正在期望。因此，我们贡献了一条名为“转换流”的管道，以弥合单个和多模式数据增强之间的差距，并通过转换逆转和重播。此外，考虑到遮挡，不同对象可能会占据不同方式的一个点，从而使诸如剪切和粘贴非平凡之类的增强物进行多模式检测。我们进一步介绍了多模式切割和粘贴（MOCA），它们同时考虑了遮挡和物理合理性以维持多模式的一致性。我们的多模式检测器不使用探测器的集合，可以在Nuscenes数据集上实现新的最新性能，并在Kitti 3D基准上实现了竞争性能。我们的方法还赢得了第三个Nuscenes检测挑战中最佳PKL奖。代码和模型将在https://github.com/open-mmlab/mmdetection3d上发布。

It is counter-intuitive that multi-modality methods based on point cloud and images perform only marginally better or sometimes worse than approaches that solely use point cloud. This paper investigates the reason behind this phenomenon. Due to the fact that multi-modality data augmentation must maintain consistency between point cloud and images, recent methods in this field typically use relatively insufficient data augmentation. This shortage makes their performance under expectation. Therefore, we contribute a pipeline, named transformation flow, to bridge the gap between single and multi-modality data augmentation with transformation reversing and replaying. In addition, considering occlusions, a point in different modalities may be occupied by different objects, making augmentations such as cut and paste non-trivial for multi-modality detection. We further present Multi-mOdality Cut and pAste (MoCa), which simultaneously considers occlusion and physical plausibility to maintain the multi-modality consistency. Without using ensemble of detectors, our multi-modality detector achieves new state-of-the-art performance on nuScenes dataset and competitive performance on KITTI 3D benchmark. Our method also wins the best PKL award in the 3rd nuScenes detection challenge. Code and models will be released at https://github.com/open-mmlab/mmdetection3d.

下载PDF全文

下载文献需遵守相关版权规定

论文标题