论文标题
激光雷达语义分割的点到素蒸馏蒸馏
Point-to-Voxel Knowledge Distillation for LiDAR Semantic Segmentation
论文作者
论文摘要
本文解决了将知识从大型教师模型提取到苗条的学生网络的问题,以进行激光雷达语义细分。直接采用先前的蒸馏方法,由于点云的内在挑战,即稀疏性,随机性和变化密度,因此产生了较低的结果。为了解决上述问题,我们提出了点对多的知识蒸馏(PVD),该知识蒸馏(PVD)从点级别和体素水平都转移了隐藏的知识。具体而言,我们首先利用了点和自动voxelwise蒸馏以补充稀疏监督信号。然后,为了更好地利用结构信息,我们将整个点云分为几个Subersoxels,并将困难的抽样策略设计为更频繁地采样包含较少频繁的类和遥远对象的超级氧基。在这些Supervoxels上,我们提出了点间和体内的亲和力蒸馏,其中点和体素之间的相似性信息可以帮助学生模型更好地捕获周围环境的结构信息。我们对两个流行的LiDAR分割基准测试,即Nuscenes和Semantickitti进行了广泛的实验。在这两个基准上,我们的PVD始终在三个代表性的骨架上(即Cylinder3D,SPVNAS和Minkowskinet)持续超过先前的蒸馏方法。值得注意的是,在具有挑战性的Nuscenes和Semantickitti数据集上,我们的方法可以在竞争性的Cylinder3D模型上实现大约75%的MAC和2倍的速度,而在所有已发表算法的Semantickitti排行榜上排名第1。我们的代码可在https://github.com/cardwing/codes-for-pvkd上找到。
This article addresses the problem of distilling knowledge from a large teacher model to a slim student network for LiDAR semantic segmentation. Directly employing previous distillation approaches yields inferior results due to the intrinsic challenges of point cloud, i.e., sparsity, randomness and varying density. To tackle the aforementioned problems, we propose the Point-to-Voxel Knowledge Distillation (PVD), which transfers the hidden knowledge from both point level and voxel level. Specifically, we first leverage both the pointwise and voxelwise output distillation to complement the sparse supervision signals. Then, to better exploit the structural information, we divide the whole point cloud into several supervoxels and design a difficulty-aware sampling strategy to more frequently sample supervoxels containing less-frequent classes and faraway objects. On these supervoxels, we propose inter-point and inter-voxel affinity distillation, where the similarity information between points and voxels can help the student model better capture the structural information of the surrounding environment. We conduct extensive experiments on two popular LiDAR segmentation benchmarks, i.e., nuScenes and SemanticKITTI. On both benchmarks, our PVD consistently outperforms previous distillation approaches by a large margin on three representative backbones, i.e., Cylinder3D, SPVNAS and MinkowskiNet. Notably, on the challenging nuScenes and SemanticKITTI datasets, our method can achieve roughly 75% MACs reduction and 2x speedup on the competitive Cylinder3D model and rank 1st on the SemanticKITTI leaderboard among all published algorithms. Our code is available at https://github.com/cardwing/Codes-for-PVKD.