Hifi-gan：基于对抗性网络中的语音深度特征的高保真性降解和去脊椎

论文标题

Hifi-gan：基于对抗性网络中的语音深度特征的高保真性降解和去脊椎

HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

论文作者

Su, Jiaqi, Jin, Zeyu, Finkelstein, Adam

论文摘要

现实世界中的音频记录通常会因噪声，混响和均衡失真等因素而降级。本文介绍了Hifi-Gan，这是一种深入学习方法，将记录的语音转换为声音，好像是在工作室中记录的一样。我们使用端到端的馈电象征体架构，该体系结构在时域和时频域中都经过多尺度的对抗歧视器训练。它依赖于歧视者的深度匹配损失来提高增强语音的感知质量。拟议的模型将新的演讲者，新的语音内容和新环境概括。在客观和主观实验中，它的表现明显优于最先进的基线方法。

Real-world audio recordings are often degraded by factors such as noise, reverberation, and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to transform recorded speech to sound as though it had been recorded in a studio. We use an end-to-end feed-forward WaveNet architecture, trained with multi-scale adversarial discriminators in both the time domain and the time-frequency domain. It relies on the deep feature matching losses of the discriminators to improve the perceptual quality of enhanced speech. The proposed model generalizes well to new speakers, new speech content, and new environments. It significantly outperforms state-of-the-art baseline methods in both objective and subjective experiments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题