使用自我监督的人特定模型对个人的面部运动进行客观表征

论文标题

使用自我监督的人特定模型对个人的面部运动进行客观表征

Towards an objective characterization of an individual's facial movements using Self-Supervised Person-Specific-Models

论文作者

Tazi, Yanis, Berger, Michael, Freiwald, Winrich A.

论文摘要

将面部运动与其他面部特征（尤其是面部身份）脱颖而出，这仍然是一项艰巨的任务，因为面部运动在个人之间表现出很大的变化。在本文中，我们旨在表征个人特定的面部运动。我们提出了一种新颖的训练方法，可以独立于其他面部特征来学习面部运动，并分别关注每个人。我们提出了自我监督的人特异性模型（PSM），其中每个人可以学会从未标记的面部视频中独立于人的身份和其他结构性面部特征来提取面部运动的嵌入。这些模型是使用类似编码器的架构进行训练的。我们提供了定量和定性的证据，即PSM学习了有意义的面部嵌入，该面部嵌入发现了细粒度的运动，否则未以通用模型（GM）为特征的细粒度运动，该运动经过了跨个体训练并表征了面部运动的一般模式。我们提供了定量和定性的证据，表明这种方法易于扩展，并且对于新个体来说是可扩展的：从一个人那里学到的面部运动知识可以迅速有效地转移给新人。最后，我们使用课程时间学习提出了一种新颖的PSM，以利用视频帧之间的时间连续性。我们的代码，分析细节和所有预验证的模型都可以在GitHub和补充材料中找到。

Disentangling facial movements from other facial characteristics, particularly from facial identity, remains a challenging task, as facial movements display great variation between individuals. In this paper, we aim to characterize individual-specific facial movements. We present a novel training approach to learn facial movements independently of other facial characteristics, focusing on each individual separately. We propose self-supervised Person-Specific Models (PSMs), in which one model per individual can learn to extract an embedding of the facial movements independently of the person's identity and other structural facial characteristics from unlabeled facial video. These models are trained using encoder-decoder-like architectures. We provide quantitative and qualitative evidence that a PSM learns a meaningful facial embedding that discovers fine-grained movements otherwise not characterized by a General Model (GM), which is trained across individuals and characterizes general patterns of facial movements. We present quantitative and qualitative evidence that this approach is easily scalable and generalizable for new individuals: facial movements knowledge learned on a person can quickly and effectively be transferred to a new person. Lastly, we propose a novel PSM using curriculum temporal learning to leverage the temporal contiguity between video frames. Our code, analysis details, and all pretrained models are available in Github and Supplementary Materials.

下载PDF全文

下载文献需遵守相关版权规定

论文标题