APB2Facev2：实时音频指导的多面重演

论文标题

APB2Facev2：实时音频指导的多面重演

APB2FaceV2: Real-Time Audio-Guided Multi-Face Reenactment

论文作者

Zhang, Jiangning, Zeng, Xianfang, Xu, Chao, Chen, Jun, Liu, Yong, Jiang, Yunliang

论文摘要

音频指导的面部重演旨在产生与输入音频相匹配的影子面孔。但是，当模型经过训练或需要额外的操作，例如生成生动面孔的前提，当前方法只能在模型进行训练或需要额外的操作（例如3D渲染和图像后融合）之后重新制定一个特殊的人。为了解决上述挑战，我们提出了一种小说\ emph {r} eal time \ emph {a} udio引导\ emph \ emph {m} ulti-face重新制作方法，名为\ emph {apb2facev2}，这可以重新启动具有相应参考式信号和驱动器信号的多个目标。我们可以训练模型端到端，并具有更快的速度，我们设计了一个名为Adaptive卷积（ADACONV）的新型模块，以将音频信息注入网络中，并采用轻量级网络作为我们的骨架，以便网络可以在CPU和GPU上实时运行。比较实验证明了我们的方法比现有的最新方法的优势，进一步的实验表明，对于实际应用，我们的方法具有有效的灵活性https://github.com/zhangzjn/apb2facev2

Audio-guided face reenactment aims to generate a photorealistic face that has matched facial expression with the input audio. However, current methods can only reenact a special person once the model is trained or need extra operations such as 3D rendering and image post-fusion on the premise of generating vivid faces. To solve the above challenge, we propose a novel \emph{R}eal-time \emph{A}udio-guided \emph{M}ulti-face reenactment approach named \emph{APB2FaceV2}, which can reenact different target faces among multiple persons with corresponding reference face and drive audio signal as inputs. Enabling the model to be trained end-to-end and have a faster speed, we design a novel module named Adaptive Convolution (AdaConv) to infuse audio information into the network, as well as adopt a lightweight network as our backbone so that the network can run in real time on CPU and GPU. Comparison experiments prove the superiority of our approach than existing state-of-the-art methods, and further experiments demonstrate that our method is efficient and flexible for practical applications https://github.com/zhangzjn/APB2FaceV2

下载PDF全文

下载文献需遵守相关版权规定

论文标题