论文标题
使用单峰隐私安全的非个人方法来识别群体级别的情感
Group-Level Emotion Recognition Using a Unimodal Privacy-Safe Non-Individual Approach
论文作者
论文摘要
本文介绍了我们对野外(Emotiw)挑战2020 1的情感识别的Audio-Video群体情感识别子任务的单个隐私保护和非个人建议。该Sub挑战旨在将野生视频中的分类为三类:积极,中立和负面。最近的深度学习模型表明,在分析人们之间的相互作用,预测人类行为和情感评估方面取得了巨大进展。尽管如此,它们的性能来自基于个体的分析,这意味着总结和平均个人检测分数,这不可避免地导致一些隐私问题。在这项研究中,我们调查了一种节俭的方法,可以在不使用面部或姿势检测或任何基于个人的功能作为输入的情况下从整个图像中捕获全球情绪的模型。拟议的方法将最先进和专用的合成语料库作为培训来源混合在一起。通过深入探索神经网络体系结构,以实现群体级别的情感识别,我们建立了一个基于VGG的模型,在VGAF测试集(挑战的第十一位)上实现了59.13%的精度。鉴于该分析仅基于全局特征,并且在现实世界数据集上评估了性能,因此这些结果是有希望的,让我们设想将此模型扩展到多模式,以供我们的最终目标应用程序进行课堂氛围评估。
This article presents our unimodal privacy-safe and non-individual proposal for the audio-video group emotion recognition subtask at the Emotion Recognition in the Wild (EmotiW) Challenge 2020 1. This sub challenge aims to classify in the wild videos into three categories: Positive, Neutral and Negative. Recent deep learning models have shown tremendous advances in analyzing interactions between people, predicting human behavior and affective evaluation. Nonetheless, their performance comes from individual-based analysis, which means summing up and averaging scores from individual detections, which inevitably leads to some privacy issues. In this research, we investigated a frugal approach towards a model able to capture the global moods from the whole image without using face or pose detection, or any individual-based feature as input. The proposed methodology mixes state-of-the-art and dedicated synthetic corpora as training sources. With an in-depth exploration of neural network architectures for group-level emotion recognition, we built a VGG-based model achieving 59.13% accuracy on the VGAF test set (eleventh place of the challenge). Given that the analysis is unimodal based only on global features and that the performance is evaluated on a real-world dataset, these results are promising and let us envision extending this model to multimodality for classroom ambiance evaluation, our final target application.