使用Texel对准功能的可驱动体积化头像

论文标题

使用Texel对准功能的可驱动体积化头像

Drivable Volumetric Avatars using Texel-Aligned Features

论文作者

Remelli, Edoardo, Bagautdinov, Timur, Saito, Shunsuke, Simon, Tomas, Wu, Chenglei, Wei, Shih-En, Guo, Kaiwen, Cao, Zhe, Prada, Fabian, Saragih, Jason, Sheikh, Yaser

论文摘要

逼真的触觉需要高保真的身体建模和忠实的驾驶才能使动态合成的外观与现实无法区分。在这项工作中，我们提出了一个端到端框架，该框架解决了建模和推动真实人的全身化身方面的两个核心挑战。一个挑战是驾驶头像，同时忠于细节和动态，这些细节和动态无法被全球低维参数化（例如身体姿势）所捕获。我们的方法支持驾驶穿着皱纹和运动的衣服化身，而真正的驾驶表演者展出了训练语料库以外的表现。与现有的全局状态表示或非参数屏幕空间方法不同，我们介绍了Texel一致的特征 - 一种局部表示形式，可以利用基于骨架的参数模型的结构先验和同时观察到的稀疏图像信号。另一个挑战是建模临时连贯的衣服的化身，通常需要精确的表面跟踪。为了避免这种情况，我们提出了一种新型的体积化头像表示，通过将体积原语的混合物扩展到铰接式对象。通过明确纳入表达，我们的方法自然而然地概括了看不见的姿势。我们还介绍了局部视点条件，从而导致了依赖视图的外观的概括。所提出的体积表示不需要高质量的网格跟踪作为先决条件，并且与基于网格的对应物相比，它带来了显着的质量改进。在我们的实验中，我们仔细研究了我们的设计选择，并证明了方法的功效，在挑战驾驶方案方面的最新方法优于最新方法。

Photorealistic telepresence requires both high-fidelity body modeling and faithful driving to enable dynamically synthesized appearance that is indistinguishable from reality. In this work, we propose an end-to-end framework that addresses two core challenges in modeling and driving full-body avatars of real people. One challenge is driving an avatar while staying faithful to details and dynamics that cannot be captured by a global low-dimensional parameterization such as body pose. Our approach supports driving of clothed avatars with wrinkles and motion that a real driving performer exhibits beyond the training corpus. Unlike existing global state representations or non-parametric screen-space approaches, we introduce texel-aligned features -- a localised representation which can leverage both the structural prior of a skeleton-based parametric model and observed sparse image signals at the same time. Another challenge is modeling a temporally coherent clothed avatar, which typically requires precise surface tracking. To circumvent this, we propose a novel volumetric avatar representation by extending mixtures of volumetric primitives to articulated objects. By explicitly incorporating articulation, our approach naturally generalizes to unseen poses. We also introduce a localized viewpoint conditioning, which leads to a large improvement in generalization of view-dependent appearance. The proposed volumetric representation does not require high-quality mesh tracking as a prerequisite and brings significant quality improvements compared to mesh-based counterparts. In our experiments, we carefully examine our design choices and demonstrate the efficacy of our approach, outperforming the state-of-the-art methods on challenging driving scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题