单视图3D重建的拓扑意识的变形字段

论文标题

单视图3D重建的拓扑意识的变形字段

Topologically-Aware Deformation Fields for Single-View 3D Reconstruction

论文作者

Duggal, Shivam, Pathak, Deepak

论文摘要

我们提出了一个学习3D对象形状的框架和仅从特定于类别的图像集合中的密集的交叉对象3D对应关系。 3D形状被隐式生成为特定于类别的签名距离字段的变形，并且仅从未对齐的图像集合及其姿势的情况下以无监督的方式学习，而无需任何3D监督。通常，Internet上的图像收集包含几种类别内几何和拓扑变化，例如，不同的椅子可以具有不同的拓扑结构，这使得联合形状和对应关系估计的任务更加具有挑战性。因此，先前的工作要么专注于分别学习每个3D对象形状，而无需建模跨境对应关系，要么对具有最小类别内拓扑变化的类别进行关节形状和对应关系估计。我们通过学习拓扑意识的隐式变形字段来克服这些限制，该字段将对象空间中的3D点映射到特定类别的规范空间中的较高维点。在推论时，给定单个图像，我们首先使用拓扑意识到的变形场将对象空间中的每个3D点隐式变形到对象空间中的每个3D点，然后将3D形状重建为规范签名距离字段。典型的形状和变形场均以逆向图的方式端到端学习，使用学习的复发射线游行者（SRN）作为一个可区分的渲染模块。我们的方法被称为焦油，在几个数据集上实现了最新的重建保真度：Shapenet，Pascal3d+，Cub和Pix3D椅子。结果视频和代码https://shivamduggal4.github.io/tars-3d/

We present a framework for learning 3D object shapes and dense cross-object 3D correspondences from just an unaligned category-specific image collection. The 3D shapes are generated implicitly as deformations to a category-specific signed distance field and are learned in an unsupervised manner solely from unaligned image collections and their poses without any 3D supervision. Generally, image collections on the internet contain several intra-category geometric and topological variations, for example, different chairs can have different topologies, which makes the task of joint shape and correspondence estimation much more challenging. Because of this, prior works either focus on learning each 3D object shape individually without modeling cross-instance correspondences or perform joint shape and correspondence estimation on categories with minimal intra-category topological variations. We overcome these restrictions by learning a topologically-aware implicit deformation field that maps a 3D point in the object space to a higher dimensional point in the category-specific canonical space. At inference time, given a single image, we reconstruct the underlying 3D shape by first implicitly deforming each 3D point in the object space to the learned category-specific canonical space using the topologically-aware deformation field and then reconstructing the 3D shape as a canonical signed distance field. Both canonical shape and deformation field are learned end-to-end in an inverse-graphics fashion using a learned recurrent ray marcher (SRN) as a differentiable rendering module. Our approach, dubbed TARS, achieves state-of-the-art reconstruction fidelity on several datasets: ShapeNet, Pascal3D+, CUB, and Pix3D chairs. Result videos and code at https://shivamduggal4.github.io/tars-3D/

下载PDF全文

下载文献需遵守相关版权规定

论文标题