论文标题
面部表达视频生成基于时空卷积gan:FEV-GAN
Facial Expression Video Generation Based-On Spatio-temporal Convolutional GAN: FEV-GAN
论文作者
论文摘要
对于全球的科学家和研究人员来说,面部表情产生一直是一项有趣的任务。在这种情况下,我们介绍了生成六种基本面部表情的视频的新方法。从单个中性面部图像和指示所需面部表达的标签开始,我们旨在合成执行指定面部表达的给定身份的视频。我们的方法(称为FEV-GAN(面部表情视频gan))基于时空卷积gan,已知可以在同一网络中对内容和运动进行建模。基于此类网络的先前方法表明,可以使用平稳的时间演化生成连贯的视频。但是,它们仍然患有低图像质量和低标识保存能力。在这项工作中,我们使用由两个图像编码器组成的发电机来解决此问题。第一个是针对面部身份特征提取的预训练,第二个是用于空间特征提取的第二个。我们对两个国际面部表达基准数据库的模型进行了定性和定量评估:杯子和Oulu-Casia Nir&Vis。实验结果分析证明了我们方法在保留输入身份的同时生成六个基本面部表情的视频的有效性。该分析还证明,使用身份和空间特征可以增强解码器更好地保留身份并生成高质量视频的能力。代码和预培训模型将很快公开提供。
Facial expression generation has always been an intriguing task for scientists and researchers all over the globe. In this context, we present our novel approach for generating videos of the six basic facial expressions. Starting from a single neutral facial image and a label indicating the desired facial expression, we aim to synthesize a video of the given identity performing the specified facial expression. Our approach, referred to as FEV-GAN (Facial Expression Video GAN), is based on Spatio-temporal Convolutional GANs, that are known to model both content and motion in the same network. Previous methods based on such a network have shown a good ability to generate coherent videos with smooth temporal evolution. However, they still suffer from low image quality and low identity preservation capability. In this work, we address this problem by using a generator composed of two image encoders. The first one is pre-trained for facial identity feature extraction and the second for spatial feature extraction. We have qualitatively and quantitatively evaluated our model on two international facial expression benchmark databases: MUG and Oulu-CASIA NIR&VIS. The experimental results analysis demonstrates the effectiveness of our approach in generating videos of the six basic facial expressions while preserving the input identity. The analysis also proves that the use of both identity and spatial features enhances the decoder ability to better preserve the identity and generate high-quality videos. The code and the pre-trained model will soon be made publicly available.