编解码器的音频和凝视驱动的面部动画

论文标题

编解码器的音频和凝视驱动的面部动画

Audio- and Gaze-driven Facial Animation of Codec Avatars

论文作者

Richard, Alexander, Lea, Colin, Ma, Shugao, Gall, Juergen, de la Torre, Fernando, Sheikh, Yaser

论文摘要

编解码器化身是最近一类学习的，逼真的面部模型，它准确地代表了一个人在3D中的几何形状和纹理（即虚拟现实），并且几乎与视频没有区别。在本文中，我们描述了将这些参数模型实时动画动画的第一种方法，可以使用音频和/或眼睛跟踪将其部署在商品虚拟现实硬件上。我们的目标是在我们有损的输入信号中仅来自潜在的提示，表现出重要的社会信号（例如笑声和兴奋）的个人之间的表现性对话。为此，我们在三个参与者中收集了超过5个小时的高帧速率3D面部扫描，包括传统的中性语音以及表达和对话性演讲。我们研究了一种多模式融合方法，该方法动态识别哪种传感器编码应随时使面部的哪些部分动画。请参阅补充视频，该视频展示了我们产生全面运动的能力，远远超出了竞争工作中通常中性的唇部关节：https：//research.fb.com/videos/videos/audio-and--gaze-gaze-gaze-gaze-gaze-driven-facial-animation-ofial-facial-animation-of-codec-avatars/

Codec Avatars are a recent class of learned, photorealistic face models that accurately represent the geometry and texture of a person in 3D (i.e., for virtual reality), and are almost indistinguishable from video. In this paper we describe the first approach to animate these parametric models in real-time which could be deployed on commodity virtual reality hardware using audio and/or eye tracking. Our goal is to display expressive conversations between individuals that exhibit important social signals such as laughter and excitement solely from latent cues in our lossy input signals. To this end we collected over 5 hours of high frame rate 3D face scans across three participants including traditional neutral speech as well as expressive and conversational speech. We investigate a multimodal fusion approach that dynamically identifies which sensor encoding should animate which parts of the face at any time. See the supplemental video which demonstrates our ability to generate full face motion far beyond the typically neutral lip articulations seen in competing work: https://research.fb.com/videos/audio-and-gaze-driven-facial-animation-of-codec-avatars/

下载PDF全文

下载文献需遵守相关版权规定

论文标题