学习增强表达式以获取几片细颗粒的面部表情识别

论文标题

学习增强表达式以获取几片细颗粒的面部表情识别

Learning to Augment Expressions for Few-shot Fine-grained Facial Expression Recognition

论文作者

Wang, Wenxuan, Fu, Yanwei, Sun, Qiang, Chen, Tao, Cao, Chenjie, Zheng, Ziqi, Xu, Guoqiang, Qiu, Han, Jiang, Yu-Gang, Xue, Xiangyang

论文摘要

情感计算和认知理论被广泛用于现代人类计算机相互作用方案。作为最突出和最容易获得的功能，人的面孔吸引了研究人员的极大关注。由于人类具有丰富的情绪并发展出肌肉组织，因此在现实世界中存在许多细粒度的表达。但是，收集和注释大量面部图像是非常耗时的，其中甚至可能要求心理学家正确地对其进行分类。据我们所知，现有的表达数据集仅限于几种基本的面部表情，这不足以支持我们在开发成功的人类计算机交互系统中的野心。为此，本文贡献了一个新型的细粒面部表达数据库-F2ED，其中包含超过200k的图像，其中有119人的54个面部表情。考虑到现实世界中常见的数据分布和缺乏样品的现象，我们进一步评估了几种借助我们的F2ED的几个射击表达学习的任务，这些任务仅在少数培训实例下才能识别面部表情。这些任务模仿人类的表现，以从几个例子中学习强大和一般代表。为了解决这样的少量任务，我们提出了一个统一的任务驱动框架 - 构图生成对抗网络（Comp-gan）学习综合面部图像，从而增加了几个射击表达类别的实例。在F2ED和现有面部表达数据集（即Jaffe和FER2013）上进行了广泛的实验，以验证我们F2ED在训练前训练面部表达识别网络中的疗效，以及我们提出的方法Comp-GAN的有效性，以提高少数识别任务的性能。

Affective computing and cognitive theory are widely used in modern human-computer interaction scenarios. Human faces, as the most prominent and easily accessible features, have attracted great attention from researchers. Since humans have rich emotions and developed musculature, there exist a lot of fine-grained expressions in real-world applications. However, it is extremely time-consuming to collect and annotate a large number of facial images, of which may even require psychologists to correctly categorize them. To the best of our knowledge, the existing expression datasets are only limited to several basic facial expressions, which are not sufficient to support our ambitions in developing successful human-computer interaction systems. To this end, a novel Fine-grained Facial Expression Database - F2ED is contributed in this paper, and it includes more than 200k images with 54 facial expressions from 119 persons. Considering the phenomenon of uneven data distribution and lack of samples is common in real-world scenarios, we further evaluate several tasks of few-shot expression learning by virtue of our F2ED, which are to recognize the facial expressions given only few training instances. These tasks mimic human performance to learn robust and general representation from few examples. To address such few-shot tasks, we propose a unified task-driven framework - Compositional Generative Adversarial Network (Comp-GAN) learning to synthesize facial images and thus augmenting the instances of few-shot expression classes. Extensive experiments are conducted on F2ED and existing facial expression datasets, i.e., JAFFE and FER2013, to validate the efficacy of our F2ED in pre-training facial expression recognition network and the effectiveness of our proposed approach Comp-GAN to improve the performance of few-shot recognition tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题