魔术3D：高分辨率文本到3D内容创建

论文标题

魔术3D：高分辨率文本到3D内容创建

Magic3D: High-Resolution Text-to-3D Content Creation

论文作者

Lin, Chen-Hsuan, Gao, Jun, Tang, Luming, Takikawa, Towaki, Zeng, Xiaohui, Huang, Xun, Kreis, Karsten, Fidler, Sanja, Liu, Ming-Yu, Lin, Tsung-Yi

论文摘要

DreamFusion最近证明了预先训练的文本对图扩散模型以优化神经辐射场（NERF）的实用性，从而实现了显着的文本到3D综合结果。但是，该方法具有两个固有的局限性：（a）NERF的NERF和（b）在NERF上的低分辨率图像空间监管的极慢，导致处理时间很长的低质量3D模型。在本文中，我们通过使用两个阶段优化框架来解决这些限制。首先，我们使用低分辨率扩散先验获得了一个粗模型，并使用稀疏的3D哈希网格结构加速。将粗表示形式作为初始化，我们进一步优化了具有有效的可区分渲染器与高分辨率潜在扩散模型相互作用的纹理3D网格模型。我们的方法称为Magic3D，可以在40分钟内创建高质量的3D网格型号，这比DreamFusion快2倍（据报道平均需要1.5个小时），同时也可以实现更高的分辨率。用户研究表明，61.7％的评估者更喜欢我们的方法而不是梦想。加上图像条件的生成功能，我们为用户提供了控制3D合成的新方法，为各种创意应用程序开辟了新的途径。

DreamFusion has recently demonstrated the utility of a pre-trained text-to-image diffusion model to optimize Neural Radiance Fields (NeRF), achieving remarkable text-to-3D synthesis results. However, the method has two inherent limitations: (a) extremely slow optimization of NeRF and (b) low-resolution image space supervision on NeRF, leading to low-quality 3D models with a long processing time. In this paper, we address these limitations by utilizing a two-stage optimization framework. First, we obtain a coarse model using a low-resolution diffusion prior and accelerate with a sparse 3D hash grid structure. Using the coarse representation as the initialization, we further optimize a textured 3D mesh model with an efficient differentiable renderer interacting with a high-resolution latent diffusion model. Our method, dubbed Magic3D, can create high quality 3D mesh models in 40 minutes, which is 2x faster than DreamFusion (reportedly taking 1.5 hours on average), while also achieving higher resolution. User studies show 61.7% raters to prefer our approach over DreamFusion. Together with the image-conditioned generation capabilities, we provide users with new ways to control 3D synthesis, opening up new avenues to various creative applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题