Transpillars：多帧3D对象检测的粗到细聚合

论文标题

Transpillars：多帧3D对象检测的粗到细聚合

TransPillars: Coarse-to-Fine Aggregation for Multi-Frame 3D Object Detection

论文作者

Luo, Zhipeng, Zhang, Gongjie, Zhou, Changqing, Liu, Tianrui, Lu, Shijian, Pan, Liang

论文摘要

使用点云的3D对象检测由于其在自动驾驶和机器人技术中的广泛应用而引起了人们的关注。但是，大多数现有的研究都集中在单点云框架上，而无需在点云序列中利用时间信息。在本文中，我们设计了一种基于新颖的变压器特征聚合技术Transpillars，该技术利用连续点云帧的时间特征用于多帧3D对象检测。从两个角度来看，转质汇总的时空点云特征。首先，它直接从多帧特征映射而不是汇总实例功能融合体素级特征，以保留实例详细信息，并使用上下文信息，这些信息对于准确对象本地化至关重要。其次，它引入了层次的粗到精细策略，以逐步融合多尺度功能，以有效捕获移动对象的运动并指导精美特征的聚合。此外，引入了一系列可变形变压器，以提高跨框架功能匹配的有效性。广泛的实验表明，与现有的多框架检测方法相比，我们提出的转质可以达到最先进的性能。代码将发布。

3D object detection using point clouds has attracted increasing attention due to its wide applications in autonomous driving and robotics. However, most existing studies focus on single point cloud frames without harnessing the temporal information in point cloud sequences. In this paper, we design TransPillars, a novel transformer-based feature aggregation technique that exploits temporal features of consecutive point cloud frames for multi-frame 3D object detection. TransPillars aggregates spatial-temporal point cloud features from two perspectives. First, it fuses voxel-level features directly from multi-frame feature maps instead of pooled instance features to preserve instance details with contextual information that are essential to accurate object localization. Second, it introduces a hierarchical coarse-to-fine strategy to fuse multi-scale features progressively to effectively capture the motion of moving objects and guide the aggregation of fine features. Besides, a variant of deformable transformer is introduced to improve the effectiveness of cross-frame feature matching. Extensive experiments show that our proposed TransPillars achieves state-of-art performance as compared to existing multi-frame detection approaches. Code will be released.

下载PDF全文

下载文献需遵守相关版权规定

论文标题