论文标题
3D语义场景完成的各向异性卷积网络
Anisotropic Convolutional Networks for 3D Semantic Scene Completion
论文作者
论文摘要
作为一项Voxel的标签任务,语义场景完成(SSC)试图从单个深度和/或RGB图像中同时推断一个场景的占用和语义标签。 SSC的关键挑战是如何有效利用3D上下文来建模各种对象或形状,布局和可见度有严重变化的物品。为了处理这种变化,我们提出了一个称为各向异性卷积的新型模块,该模块具有灵活性和功率的性质,而竞争方法(例如标准3D卷积及其某些变化)不可能具有不可能的灵活性和功率。与仅限于固定3D接收场的标准3D卷积相反,我们的模块能够对尺寸各向异性体素素养模拟。基本思想是通过将3D卷积分解为三个连续的1D卷积来实现各向异性3D接受场,并且每个这样的1D卷积的内核大小在即时可以自适应地确定。通过堆叠多个此类各向异性卷积模块,可以进一步增强体素建模能力,同时维持可控量的模型参数。在NYU-DEPTH-V2和NYUCAD的两个SSC基准上进行了广泛的实验,显示了该方法的出色性能。我们的代码可在https://waterljwant.github.io/ssc/上找到
As a voxel-wise labeling task, semantic scene completion (SSC) tries to simultaneously infer the occupancy and semantic labels for a scene from a single depth and/or RGB image. The key challenge for SSC is how to effectively take advantage of the 3D context to model various objects or stuffs with severe variations in shapes, layouts and visibility. To handle such variations, we propose a novel module called anisotropic convolution, which properties with flexibility and power impossible for the competing methods such as standard 3D convolution and some of its variations. In contrast to the standard 3D convolution that is limited to a fixed 3D receptive field, our module is capable of modeling the dimensional anisotropy voxel-wisely. The basic idea is to enable anisotropic 3D receptive field by decomposing a 3D convolution into three consecutive 1D convolutions, and the kernel size for each such 1D convolution is adaptively determined on the fly. By stacking multiple such anisotropic convolution modules, the voxel-wise modeling capability can be further enhanced while maintaining a controllable amount of model parameters. Extensive experiments on two SSC benchmarks, NYU-Depth-v2 and NYUCAD, show the superior performance of the proposed method. Our code is available at https://waterljwant.github.io/SSC/