论文标题
通过针对分类语音情感识别的注意力建模进行多个实例学习
Advancing Multiple Instance Learning with Attention Modeling for Categorical Speech Emotion Recognition
论文作者
论文摘要
分类语音情绪识别通常是作为标签到标签问题的序列性问题,即确定整个输入话语的离散情感标签。实践中的主要挑战之一是,大多数现有的情感语料库都不为每个细分市场提供地面真相标签。相反,我们只有用于整个话语的标签。为了从这种弱标记的情感语料库中提取细分级的情感信息,我们建议使用多个实例学习(MIL)以弱监督的方式学习细分市场的嵌入。同样,对于足够长的话语,并非所有细分市场都包含相关的情感信息。在这方面,然后将三个基于注意力的神经网络模型应用于学习的片段嵌入,以参加语音发言中最突出的部分。与其他最先进的方法相比,CASIA语料库和IEMOCAP数据库的实验表现出更好或高度竞争的结果。
Categorical speech emotion recognition is typically performed as a sequence-to-label problem, i.e., to determine the discrete emotion label of the input utterance as a whole. One of the main challenges in practice is that most of the existing emotion corpora do not give ground truth labels for each segment; instead, we only have labels for whole utterances. To extract segment-level emotional information from such weakly labeled emotion corpora, we propose using multiple instance learning (MIL) to learn segment embeddings in a weakly supervised manner. Also, for a sufficiently long utterance, not all of the segments contain relevant emotional information. In this regard, three attention-based neural network models are then applied to the learned segment embeddings to attend the most salient part of a speech utterance. Experiments on the CASIA corpus and the IEMOCAP database show better or highly competitive results than other state-of-the-art approaches.