Cylinder3D：用于驾驶场景激光雷达语义分段的有效3D框架

论文标题

Cylinder3D：用于驾驶场景激光雷达语义分段的有效3D框架

Cylinder3D: An Effective 3D Framework for Driving-scene LiDAR Semantic Segmentation

论文作者

Zhou, Hui, Zhu, Xinge, Song, Xiao, Ma, Yuexin, Wang, Zhe, Li, Hongsheng, Lin, Dahua

论文摘要

大规模驾驶现场激光雷达语义分割的最先进方法通常会在2D空间中投射并处理点云。投影方法包括球形投影，鸟眼视图投影等。尽管此过程使点云适合基于2D CNN的网络，但它不可避免地会改变并放弃3D拓扑和几何关系。解决3D到2D投影问题的直接解决方案是保持3D表示并处理3D空间中的点。在这项工作中，我们首先对2D和3D空间中不同表示形式和骨干进行深入分析，并揭示3D表示和网络对激光雷达细分的有效性。然后，我们开发一个3D气缸分区和一个基于3D气缸卷积的框架，称为Cylinder3d，该框架利用了3D拓扑关系和驾驶场景点云的结构。此外，引入了基于维数分解的上下文建模模块，以逐步以渐进的方式探索高级上下文信息。我们在大规模驾驶现场数据集（即Sematickitti）上评估了所提出的模型。我们的方法在MIOU方面实现了最先进的性能，并胜过现有方法6％。

State-of-the-art methods for large-scale driving-scene LiDAR semantic segmentation often project and process the point clouds in the 2D space. The projection methods includes spherical projection, bird-eye view projection, etc. Although this process makes the point cloud suitable for the 2D CNN-based networks, it inevitably alters and abandons the 3D topology and geometric relations. A straightforward solution to tackle the issue of 3D-to-2D projection is to keep the 3D representation and process the points in the 3D space. In this work, we first perform an in-depth analysis for different representations and backbones in 2D and 3D spaces, and reveal the effectiveness of 3D representations and networks on LiDAR segmentation. Then, we develop a 3D cylinder partition and a 3D cylinder convolution based framework, termed as Cylinder3D, which exploits the 3D topology relations and structures of driving-scene point clouds. Moreover, a dimension-decomposition based context modeling module is introduced to explore the high-rank context information in point clouds in a progressive manner. We evaluate the proposed model on a large-scale driving-scene dataset, i.e. SematicKITTI. Our method achieves state-of-the-art performance and outperforms existing methods by 6% in terms of mIoU.

下载PDF全文

下载文献需遵守相关版权规定

论文标题