序列到序列情感语音转换的概述和分析

论文标题

序列到序列情感语音转换的概述和分析

An Overview & Analysis of Sequence-to-Sequence Emotional Voice Conversion

论文作者

Yang, Zijiang, Jing, Xin, Triantafyllopoulos, Andreas, Song, Meishu, Aslan, Ilhan, Schuller, Björn W.

论文摘要

情感语音转换（EVC）专注于将语音说话从源头转变为目标情感。因此，它可以成为人力计算器互动应用及以后的关键促进技术。但是，EVC仍然是一个未解决的研究问题，存在一些挑战。特别是，由于语音速率和节奏是情绪转化的两个关键因素，因此模型必须生成不同长度的输出序列。对于可以克服这些挑战的模型，序列到序列建模最近成为竞争范式。为了在这个有希望的新方向上刺激进一步的研究，从六个角度进行了系统的调查和审查，最近的序列到序列EVC论文：他们的动机，训练策略，模型体系结构，数据集，模型输入和评估方法。该信息的组织是为了为研究社区提供易于消化的目前最新概述。最后，我们讨论了序列到序列EVC的现有挑战。

Emotional voice conversion (EVC) focuses on converting a speech utterance from a source to a target emotion; it can thus be a key enabling technology for human-computer interaction applications and beyond. However, EVC remains an unsolved research problem with several challenges. In particular, as speech rate and rhythm are two key factors of emotional conversion, models have to generate output sequences of differing length. Sequence-to-sequence modelling is recently emerging as a competitive paradigm for models that can overcome those challenges. In an attempt to stimulate further research in this promising new direction, recent sequence-to-sequence EVC papers were systematically investigated and reviewed from six perspectives: their motivation, training strategies, model architectures, datasets, model inputs, and evaluation methods. This information is organised to provide the research community with an easily digestible overview of the current state-of-the-art. Finally, we discuss existing challenges of sequence-to-sequence EVC.

下载PDF全文

下载文献需遵守相关版权规定

论文标题