语义隐式神经场景表示，并通过半监督训练

论文标题

语义隐式神经场景表示，并通过半监督训练

Semantic Implicit Neural Scene Representations With Semi-Supervised Training

论文作者

Kohli, Amit, Sitzmann, Vincent, Wetzstein, Gordon

论文摘要

隐式神经场景表示的最新成功为我们如何捕获和存储3D场景提供了一种可行的新方法。与传统的3D表示不同，例如点云（点云）在离散的局部单元中明确存储场景属性，这些隐式表示在神经网络的权重中编码一个可以在任何坐标上查询以产生这些相同场景属性的场景。到目前为止，隐式表示主要已被优化，以仅估计场景中的外观和/或3D几何信息。我们采取下一步，并证明现有的隐式表示（SRN）实际上是多模式的。可以进一步利用它来执行每点语义分割，同时保持其表示外观和几何形状的能力。为了实现这种多模式行为，我们在现有的预训练场景表示上使用了半监督的学习策略。我们的方法很简单，一般，只需要几十个标记的2D分割掩码才能实现密集的3D语义分割。我们探索了这种语义意识隐式神经场景表示的两个新颖应用：3D新颖的视图和语义标签合成仅给出了单个输入RGB图像或2D标签掩码，以及外观和语义的3D插值。

The recent success of implicit neural scene representations has presented a viable new method for how we capture and store 3D scenes. Unlike conventional 3D representations, such as point clouds, which explicitly store scene properties in discrete, localized units, these implicit representations encode a scene in the weights of a neural network which can be queried at any coordinate to produce these same scene properties. Thus far, implicit representations have primarily been optimized to estimate only the appearance and/or 3D geometry information in a scene. We take the next step and demonstrate that an existing implicit representation (SRNs) is actually multi-modal; it can be further leveraged to perform per-point semantic segmentation while retaining its ability to represent appearance and geometry. To achieve this multi-modal behavior, we utilize a semi-supervised learning strategy atop the existing pre-trained scene representation. Our method is simple, general, and only requires a few tens of labeled 2D segmentation masks in order to achieve dense 3D semantic segmentation. We explore two novel applications for this semantically aware implicit neural scene representation: 3D novel view and semantic label synthesis given only a single input RGB image or 2D label mask, as well as 3D interpolation of appearance and semantics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题