论文标题

disc-vc:解开和F0控制的神经语音转换

DisC-VC: Disentangled and F0-Controllable Neural Voice Conversion

论文作者

Watanabe, Chihiro, Kameoka, Hirokazu

论文摘要

语音转换是转换给定话语的非语言特征的任务。由于语音的自然性在很大程度上取决于其音高模式,因此在某些应用中,希望在更改扬声器身份的同时保持原始的上升/秋季音高模式。一些现有方法通过使用源滤波器模型或开发以F0模式作为模型输入的神经网络来解决此问题。尽管与以前的方法相比,后一种方法可以达到相对较高的声音质量,但在其训练过程中,目标与生成的F0模式之间没有考虑。在本文中,我们提出了一个新的基于AutoEncoder的语音转换模型,并伴随着辅助网络,该模型确保了转换结果正确反映了指定的F0/Timbre信息。我们通过客观和主观评估表明了该方法的有效性。

Voice conversion is a task to convert a non-linguistic feature of a given utterance. Since naturalness of speech strongly depends on its pitch pattern, in some applications, it would be desirable to keep the original rise/fall pitch pattern while changing the speaker identity. Some of the existing methods address this problem by either using a source-filter model or developing a neural network that takes an F0 pattern as input to the model. Although the latter approach can achieve relatively high sound quality compared to the former one, there is no consideration for discrepancy between the target and generated F0 patterns in its training process. In this paper, we propose a new variational-autoencoder-based voice conversion model accompanied by an auxiliary network, which ensures that the conversion result correctly reflects the specified F0/timbre information. We show the effectiveness of the proposed method by objective and subjective evaluations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源