通过针对分类语音情感识别的注意力建模进行多个实例学习

论文标题

通过针对分类语音情感识别的注意力建模进行多个实例学习

Advancing Multiple Instance Learning with Attention Modeling for Categorical Speech Emotion Recognition

论文作者

Mao, Shuiyang, Ching, P. C., Kuo, C. -C. Jay, Lee, Tan

论文摘要

分类语音情绪识别通常是作为标签到标签问题的序列性问题，即确定整个输入话语的离散情感标签。实践中的主要挑战之一是，大多数现有的情感语料库都不为每个细分市场提供地面真相标签。相反，我们只有用于整个话语的标签。为了从这种弱标记的情感语料库中提取细分级的情感信息，我们建议使用多个实例学习（MIL）以弱监督的方式学习细分市场的嵌入。同样，对于足够长的话语，并非所有细分市场都包含相关的情感信息。在这方面，然后将三个基于注意力的神经网络模型应用于学习的片段嵌入，以参加语音发言中最突出的部分。与其他最先进的方法相比，CASIA语料库和IEMOCAP数据库的实验表现出更好或高度竞争的结果。

Categorical speech emotion recognition is typically performed as a sequence-to-label problem, i.e., to determine the discrete emotion label of the input utterance as a whole. One of the main challenges in practice is that most of the existing emotion corpora do not give ground truth labels for each segment; instead, we only have labels for whole utterances. To extract segment-level emotional information from such weakly labeled emotion corpora, we propose using multiple instance learning (MIL) to learn segment embeddings in a weakly supervised manner. Also, for a sufficiently long utterance, not all of the segments contain relevant emotional information. In this regard, three attention-based neural network models are then applied to the learned segment embeddings to attend the most salient part of a speech utterance. Experiments on the CASIA corpus and the IEMOCAP database show better or highly competitive results than other state-of-the-art approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题