论文标题

流氓:强大的,高质量的神经声音

HooliGAN: Robust, High Quality Neural Vocoding

论文作者

McCarthy, Ollie, Ahmed, Zohaib

论文摘要

生成模型的最新发展表明,深度学习与传统的数字信号处理(DSP)技术相结合可以成功产生令人信服的小提琴样本[1],即源兴声与波烯的结合可以产生高质量的声音器[2,3],并且生成的对抗网络(GAN)训练可以改善天然性[4,5]。通过结合这些模型中的想法,我们介绍了Hooligan,这是一种具有最佳状态结果的强大声音码器,对较小的数据集(<30分钟的secdeddata)非常好,并在GPU上的2.2MHz和CPU上的35kHz生成音频。我们还显示了基于塔科隆的模型的简单修改,该模型允许与流氓无缝集成。我们的听力测试的结果表明,该模型能够通过各种大小数据集始终如一地输出高质量音频。我们在以下演示页面上提供样本:https://resemble-ai.github.io/hooligan_demo/

Recent developments in generative models have shown that deep learning combined with traditional digital signal processing (DSP) techniques could successfully generate convincing violin samples [1], that source-excitation combined with WaveNet yields high-quality vocoders [2, 3] and that generative adversarial network (GAN) training can improve naturalness [4, 5]. By combining the ideas in these models we introduce HooliGAN, a robust vocoder that has state of the art results, finetunes very well to smaller datasets (<30 minutes of speechdata) and generates audio at 2.2MHz on GPU and 35kHz on CPU. We also show a simple modification to Tacotron-basedmodels that allows seamless integration with HooliGAN. Results from our listening tests show the proposed model's ability to consistently output high-quality audio with a variety of datasets, big and small. We provide samples at the following demo page: https://resemble-ai.github.io/hooligan_demo/

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源