剪贴器：文本驱动的建议和风格化，以使人类网眼动画

论文标题

剪贴器：文本驱动的建议和风格化，以使人类网眼动画

CLIP-Actor: Text-Driven Recommendation and Stylization for Animating Human Meshes

论文作者

Youwang, Kim, Ji-Yeon, Kim, Oh, Tae-Hyun

论文摘要

我们提出了夹子演员，这是人类网格动画的文本驱动运动建议和神经网格风格化系统。剪贴画将动画3D人网格通过推荐运动序列并优化网格样式属性来符合文本提示。我们通过利用带有语言标签的大规模人类运动数据集来构建文本驱动的人类运动推荐系统。鉴于自然语言提示，剪贴画是以粗到精细的方式提出一种符合文本的人类运动。然后，我们新颖的零拍神经样式优化详细介绍了推荐的网格序列，以暂时的和姿势 - 不可固定的方式符合提示。这是与众不同的，因为当艺术家设计的网格的姿势从一开始就不符合文本时，先前的工作就无法产生合理的结果。我们进一步提出了时空的视图增强和面罩加权的关注，从而通过利用多帧人类运动并拒绝呈现不佳的观点来稳定优化过程。我们证明，剪贴器演员在运动中产生合理和人类识别的样式3D人网，并仅根据自然语言提示，并具有详细的几何形状和纹理。

We propose CLIP-Actor, a text-driven motion recommendation and neural mesh stylization system for human mesh animation. CLIP-Actor animates a 3D human mesh to conform to a text prompt by recommending a motion sequence and optimizing mesh style attributes. We build a text-driven human motion recommendation system by leveraging a large-scale human motion dataset with language labels. Given a natural language prompt, CLIP-Actor suggests a text-conforming human motion in a coarse-to-fine manner. Then, our novel zero-shot neural style optimization detailizes and texturizes the recommended mesh sequence to conform to the prompt in a temporally-consistent and pose-agnostic manner. This is distinctive in that prior work fails to generate plausible results when the pose of an artist-designed mesh does not conform to the text from the beginning. We further propose the spatio-temporal view augmentation and mask-weighted embedding attention, which stabilize the optimization process by leveraging multi-frame human motion and rejecting poorly rendered views. We demonstrate that CLIP-Actor produces plausible and human-recognizable style 3D human mesh in motion with detailed geometry and texture solely from a natural language prompt.

下载PDF全文

下载文献需遵守相关版权规定

论文标题