基于动态自适应阈值的学习嘈杂注释可靠的面部表达识别

论文标题

基于动态自适应阈值的学习嘈杂注释可靠的面部表达识别

Dynamic Adaptive Threshold based Learning for Noisy Annotations Robust Facial Expression Recognition

论文作者

Gera, Darshan, Badveeti, Naveen Siva Kumar, Kumar, Bobbili Veerendra Raj, Balasubramanian, S

论文摘要

实际的面部表达识别（FER）数据集遭受噪音注释，由于众包，表达中的歧义，注释者的主观性和类间的相似性。但是，最近的深层网络具有强大的能力，可以记住嘈杂的注释，导致腐蚀特征嵌入和概括不良。为了处理嘈杂的注释，我们提出了一个动态的FER学习框架（DNFER），其中根据训练过程中的动态类特定阈值选择了干净的样品。具体而言，DNFER基于使用选定的干净样品的监督培训，并使用所有样品进行了无监督的一致培训。在训练过程中，每个迷你批次的平均后类概率用作动态类别特定的阈值，以选择干净的样品进行监督训练。该阈值与噪声率无关，与其他方法不同，不需要任何干净的数据。此外，要从所有样品中学习，使用无监督的一致性损失对准弱调和强调图像之间的后验分布。我们证明了DNFER在合成和真实噪声注释的FER数据集（如RAFDB，FERPLUS，SFEW和falctnet）上的鲁棒性。

The real-world facial expression recognition (FER) datasets suffer from noisy annotations due to crowd-sourcing, ambiguity in expressions, the subjectivity of annotators and inter-class similarity. However, the recent deep networks have strong capacity to memorize the noisy annotations leading to corrupted feature embedding and poor generalization. To handle noisy annotations, we propose a dynamic FER learning framework (DNFER) in which clean samples are selected based on dynamic class specific threshold during training. Specifically, DNFER is based on supervised training using selected clean samples and unsupervised consistent training using all the samples. During training, the mean posterior class probabilities of each mini-batch is used as dynamic class-specific threshold to select the clean samples for supervised training. This threshold is independent of noise rate and does not need any clean data unlike other methods. In addition, to learn from all samples, the posterior distributions between weakly-augmented image and strongly-augmented image are aligned using an unsupervised consistency loss. We demonstrate the robustness of DNFER on both synthetic as well as on real noisy annotated FER datasets like RAFDB, FERPlus, SFEW and AffectNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题