情绪不变的扬声器嵌入说话者身份证明和情感语音

论文标题

情绪不变的扬声器嵌入说话者身份证明和情感语音

Emotion Invariant Speaker Embeddings for Speaker Identification with Emotional Speech

论文作者

Sarma, Biswajit Dev, Das, Rohan Kumar

论文摘要

发现说话者的情绪状态在语音生产中具有重大影响，这可能会偏离中立状态的言语。这使得用不同情绪的说话者成为一项艰巨的任务，因为通常使用中性语音训练扬声器模型。在这项工作中，我们建议通过创建情绪不变的说话者嵌入来克服这个问题。我们学习了一个提取器网络，该网络将使用基于I-矢量的系统获得的不同情绪映射测试嵌入到情绪不变的空间。因此，由此产生的测试嵌入变得不变，从而弥补了各种情绪状态之间的不匹配。这些研究是使用Iemocap数据库中的四个不同情绪类别进行的。使用情绪不变的扬声器嵌入具有不同情绪的普通式框架，我们获得了说话者识别研究的准确性2.6％的绝对提高。

Emotional state of a speaker is found to have significant effect in speech production, which can deviate speech from that arising from neutral state. This makes identifying speakers with different emotions a challenging task as generally the speaker models are trained using neutral speech. In this work, we propose to overcome this problem by creation of emotion invariant speaker embedding. We learn an extractor network that maps the test embeddings with different emotions obtained using i-vector based system to an emotion invariant space. The resultant test embeddings thus become emotion invariant and thereby compensate the mismatch between various emotional states. The studies are conducted using four different emotion classes from IEMOCAP database. We obtain an absolute improvement of 2.6% in accuracy for speaker identification studies using emotion invariant speaker embedding against average speaker model based framework with different emotions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题