3DLATNAV：在语义吸引3D对象操纵语义引导的潜在空间

论文标题

3DLATNAV：在语义吸引3D对象操纵语义引导的潜在空间

3DLatNav: Navigating Generative Latent Spaces for Semantic-Aware 3D Object Manipulation

论文作者

Dharmasiri, Amaya, Dissanayake, Dinithi, Afham, Mohamed, Dissanayake, Isuru, Rodrigo, Ranga, Thilakarathna, Kanchana

论文摘要

3D生成模型最近成功地以点云的形式生成了现实的3D对象。但是，大多数模型没有提供可控性来操纵组件对象部分的形状语义，而无需大量的语义属性标签或其他参考点云。此外，除了执行简单的潜在矢量算术或插值的能力之外，还缺乏对3D形状的零件级语义如何在其相应的生成潜在空间中编码的方式。在本文中，我们提出了3Dlatnav；一种新型的方法，用于导航预验证的生成潜在空间，以实现对3D对象的受控零件级别的语义操纵。首先，我们提出了使用3D形状的潜在表示，提出了一种零件级弱监督的形状语义识别机制。然后，我们将该知识传输到预估计的3D对象生成潜在空间，以解开嵌入的嵌入，以以线性子空间的形式表示对象部分组件部分的不同形状语义，尽管训练过程中零件级别的标签不可用。最后，我们利用那些已识别的子空间来证明可控制的3D对象部分操作可以通过将提出的框架应用于任何预算的3D生成模型来实现。通过两个新型的定量指标来评估零件级操作的一致性和定位精度，我们表明3Dlatnav在识别编码3D对象的零件级别形状语义的潜在方向方面胜过现有的无监督潜在分解方法。通过多次消融研究并对最新的生成模型进行了测试，我们表明3Dlatnav可以在输入点云上实现受控的零件级别的语义操纵，同时保留其他功能和对象的现实性质。

3D generative models have been recently successful in generating realistic 3D objects in the form of point clouds. However, most models do not offer controllability to manipulate the shape semantics of component object parts without extensive semantic attribute labels or other reference point clouds. Moreover, beyond the ability to perform simple latent vector arithmetic or interpolations, there is a lack of understanding of how part-level semantics of 3D shapes are encoded in their corresponding generative latent spaces. In this paper, we propose 3DLatNav; a novel approach to navigating pretrained generative latent spaces to enable controlled part-level semantic manipulation of 3D objects. First, we propose a part-level weakly-supervised shape semantics identification mechanism using latent representations of 3D shapes. Then, we transfer that knowledge to a pretrained 3D object generative latent space to unravel disentangled embeddings to represent different shape semantics of component parts of an object in the form of linear subspaces, despite the unavailability of part-level labels during the training. Finally, we utilize those identified subspaces to show that controllable 3D object part manipulation can be achieved by applying the proposed framework to any pretrained 3D generative model. With two novel quantitative metrics to evaluate the consistency and localization accuracy of part-level manipulations, we show that 3DLatNav outperforms existing unsupervised latent disentanglement methods in identifying latent directions that encode part-level shape semantics of 3D objects. With multiple ablation studies and testing on state-of-the-art generative models, we show that 3DLatNav can implement controlled part-level semantic manipulations on an input point cloud while preserving other features and the realistic nature of the object.

下载PDF全文

下载文献需遵守相关版权规定

论文标题