3D素描感知的语义场景通过半监督结构完成

论文标题

3D素描感知的语义场景通过半监督结构完成

3D Sketch-aware Semantic Scene Completion via Semi-supervised Structure Prior

论文作者

Chen, Xiaokang, Lin, Kwan-Yee, Qian, Chen, Zeng, Gang, Li, Hongsheng

论文摘要

语义场景完成（SSC）任务的目标是同时预测单个视图观察中体积占用和语义标签的完整的3D体素表示。由于计算成本通常会随着体素解决方案的增长而爆炸性增加，因此大多数当前最新的面临必须将其框架定制为低分辨率表示，并牺牲细节预测。因此，体素分辨率成为导致性能瓶颈的关键困难之一。在本文中，我们建议设计一种新的基于几何的策略，以嵌入低分辨率体素表示的深度信息，该信息仍然可以编码足够的几何信息，例如，房间布局，对象的大小和形状，以推断现场的无形区域，并具有良好的结构提供细节。为此，我们首先提出了一种新颖的3D草图感知功能嵌入，以有效，有效地明确编码几何信息。借助3D草图，我们进一步设计了一个简单而有效的语义场景完成框架，该框架结合了一个轻量级的3D草图幻觉模块，以通过半手不足的结构来指导占用的推理和语义标签，并在先前学习策略。我们证明，我们提出的几何嵌入比从习惯SSC框架中学习的深度特征更好。我们的最终模型在三个公共基准上始终超过最新模型，这仅需要3D体积的输入和输出分辨率为60 x 36 x 60。代码和补充材料将在https://charlescxk.github.io上找到。

The goal of the Semantic Scene Completion (SSC) task is to simultaneously predict a completed 3D voxel representation of volumetric occupancy and semantic labels of objects in the scene from a single-view observation. Since the computational cost generally increases explosively along with the growth of voxel resolution, most current state-of-the-arts have to tailor their framework into a low-resolution representation with the sacrifice of detail prediction. Thus, voxel resolution becomes one of the crucial difficulties that lead to the performance bottleneck. In this paper, we propose to devise a new geometry-based strategy to embed depth information with low-resolution voxel representation, which could still be able to encode sufficient geometric information, e.g., room layout, object's sizes and shapes, to infer the invisible areas of the scene with well structure-preserving details. To this end, we first propose a novel 3D sketch-aware feature embedding to explicitly encode geometric information effectively and efficiently. With the 3D sketch in hand, we further devise a simple yet effective semantic scene completion framework that incorporates a light-weight 3D Sketch Hallucination module to guide the inference of occupancy and the semantic labels via a semi-supervised structure prior learning strategy. We demonstrate that our proposed geometric embedding works better than the depth feature learning from habitual SSC frameworks. Our final model surpasses state-of-the-arts consistently on three public benchmarks, which only requires 3D volumes of 60 x 36 x 60 resolution for both input and output. The code and the supplementary material will be available at https://charlesCXK.github.io.

下载PDF全文

下载文献需遵守相关版权规定

论文标题