统一的源滤波器gan具有谐波加噪声源激发产生

论文标题

统一的源滤波器gan具有谐波加噪声源激发产生

Unified Source-Filter GAN with Harmonic-plus-Noise Source Excitation Generation

论文作者

Yoneyama, Reo, Wu, Yi-Chiao, Toda, Tomoki

论文摘要

本文介绍了具有谐波源过滤器网络，该网络具有谐波源源激发生成机制。在以前的工作中，我们提出了统一的源滤波器gan（USFGAN），用于使用统一源滤波器神经网络体系结构来开发具有灵活的语音可控性的高保真神经声码编码器。但是，USFGAN对Aperiodic源激发信号进行建模的能力不足，并且自然语音和产生的语音之间的声音质量仍然存在差距。为了改善源激发建模和产生的声音质量，提出了一个新的源激励生成网络，分别产生了周期性和大约组件。还采用了Hifigan的高级对抗训练程序来代替原始USFGAN中使用的平行波甘的训练。客观和主观评估结果都表明，修改后的USFGAN显着提高了基本USFGAN的声音质量，同时保持语音可控性。

This paper introduces a unified source-filter network with a harmonic-plus-noise source excitation generation mechanism. In our previous work, we proposed unified Source-Filter GAN (uSFGAN) for developing a high-fidelity neural vocoder with flexible voice controllability using a unified source-filter neural network architecture. However, the capability of uSFGAN to model the aperiodic source excitation signal is insufficient, and there is still a gap in sound quality between the natural and generated speech. To improve the source excitation modeling and generated sound quality, a new source excitation generation network separately generating periodic and aperiodic components is proposed. The advanced adversarial training procedure of HiFiGAN is also adopted to replace that of Parallel WaveGAN used in the original uSFGAN. Both objective and subjective evaluation results show that the modified uSFGAN significantly improves the sound quality of the basic uSFGAN while maintaining the voice controllability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题