论文标题
从身体手势中识别情感的广义零拍框架
A Generalized Zero-Shot Framework for Emotion Recognition from Body Gestures
论文作者
论文摘要
尽管面部表情和言语的自动情绪识别取得了显着的进步,但尚未对身体手势的情感识别进行彻底探索。人们经常使用各种肢体语言来表达情绪,并且很难枚举所有情感的身体手势并为每个类别收集足够的样本。因此,认识到新的情感身体手势对于更好地理解人类情绪至关重要。但是,现有方法无法准确确定新身体手势所属的情绪状态。为了解决这个问题,我们引入了广义的零射门学习(GZSL)框架,该框架由三个分支组成,用于推断新身体手势的情绪状态,仅使用其语义描述。第一个分支是基于原型的检测器(PBD),用于确定样品是否属于可见的身体手势类别并从可见类别中获得样品的预测结果。第二个分支是带有歧管正则化的堆叠自动编码器(STAE),它利用语义表示来预测看不见类别的样本。请注意,以上两个分支都是用于身体手势识别的。我们进一步添加了一个用软马克斯层作为第三个分支的情感分类器,以便更好地了解该情感分类任务的功能表示。这三个分支的输入特征是通过共享特征提取网络(即具有自我发项式模块的双向长期记忆网络(BLSTM)学习的。我们将这三个分支视为子任务,并使用多任务学习策略进行联合培训。我们在情感识别数据集上的框架的性能优于传统的情感分类方法和最新的零摄像学习方法。
Although automatic emotion recognition from facial expressions and speech has made remarkable progress, emotion recognition from body gestures has not been thoroughly explored. People often use a variety of body language to express emotions, and it is difficult to enumerate all emotional body gestures and collect enough samples for each category. Therefore, recognizing new emotional body gestures is critical for better understanding human emotions. However, the existing methods fail to accurately determine which emotional state a new body gesture belongs to. In order to solve this problem, we introduce a Generalized Zero-Shot Learning (GZSL) framework, which consists of three branches to infer the emotional state of the new body gestures with only their semantic descriptions. The first branch is a Prototype-Based Detector (PBD) which is used to determine whether an sample belongs to a seen body gesture category and obtain the prediction results of the samples from the seen categories. The second branch is a Stacked AutoEncoder (StAE) with manifold regularization, which utilizes semantic representations to predict samples from unseen categories. Note that both of the above branches are for body gesture recognition. We further add an emotion classifier with a softmax layer as the third branch in order to better learn the feature representations for this emotion classification task. The input features for these three branches are learned by a shared feature extraction network, i.e., a Bidirectional Long Short-Term Memory Networks (BLSTM) with a self-attention module. We treat these three branches as subtasks and use multi-task learning strategies for joint training. The performance of our framework on an emotion recognition dataset is significantly superior to the traditional method of emotion classification and state-of-the-art zero-shot learning methods.