论文标题
与图像数据的3D点云网络对3D点云网络的预训练
Self-Supervised Pre-training of 3D Point Cloud Networks with Image Data
论文作者
论文摘要
当标签稀缺且昂贵时,减少监督培训所需的注释数量至关重要。这种减少对于涉及3D数据集的语义细分任务尤其重要,这些任务通常比基于图像基于图像的数据集大大较小且更具挑战性的注释。在大型未标记数据集上进行自我监督的预训练是减少所需的手动注释量的一种方法。以前的工作专注于与点云数据的预训练。这种方法通常需要两个或多个注册的视图。在目前的工作中,我们首先学习自我监视的图像特征,然后使用这些功能来训练3D模型,从而结合图像和点云模式。通过合并通常包含在许多3D数据集中的图像数据,我们的预训练方法只需要对场景进行一次扫描。我们证明,尽管使用了单扫描,但我们的训练前方法与其他纯云云方法的性能相当。
Reducing the quantity of annotations required for supervised training is vital when labels are scarce and costly. This reduction is especially important for semantic segmentation tasks involving 3D datasets that are often significantly smaller and more challenging to annotate than their image-based counterparts. Self-supervised pre-training on large unlabelled datasets is one way to reduce the amount of manual annotations needed. Previous work has focused on pre-training with point cloud data exclusively; this approach often requires two or more registered views. In the present work, we combine image and point cloud modalities, by first learning self-supervised image features and then using these features to train a 3D model. By incorporating image data, which is often included in many 3D datasets, our pre-training method only requires a single scan of a scene. We demonstrate that our pre-training approach, despite using single scans, achieves comparable performance to other multi-scan, point cloud-only methods.