NU-WAVE 2：各种采样率的一般神经音频提升模型

论文标题

NU-WAVE 2：各种采样率的一般神经音频提升模型

NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates

论文作者

Han, Seungu, Lee, Junhyeok

论文摘要

通常，音频超分辨率模型固定了初始采样率和目标采样率，这需要对每对采样率进行训练的模型。我们引入了NU-WAVE 2，这是一种用于神经音频上采样的扩散模型，该模型可以通过单个模型从各种采样率的输入中生成48 kHz音频信号。基于NU-WAVE的架构，NU-WAVE 2使用短时傅立叶卷积（STFC）生成谐波来解决NU-WAVE的主要故障模式，并结合了带宽光谱特征变换（BSFT）来调节频域中输入的带宽带宽。我们通过实验表明，NU-WAVE 2可产生高分辨率音频，而不论输入的采样速率如何，同时需要的参数少于其他模型。官方代码和音频样本可在https://mindslab-ai.github.io/nuwave2上找到。

Conventionally, audio super-resolution models fixed the initial and the target sampling rates, which necessitate the model to be trained for each pair of sampling rates. We introduce NU-Wave 2, a diffusion model for neural audio upsampling that enables the generation of 48 kHz audio signals from inputs of various sampling rates with a single model. Based on the architecture of NU-Wave, NU-Wave 2 uses short-time Fourier convolution (STFC) to generate harmonics to resolve the main failure modes of NU-Wave, and incorporates bandwidth spectral feature transform (BSFT) to condition the bandwidths of inputs in the frequency domain. We experimentally demonstrate that NU-Wave 2 produces high-resolution audio regardless of the sampling rate of input while requiring fewer parameters than other models. The official code and the audio samples are available at https://mindslab-ai.github.io/nuwave2.

下载PDF全文

下载文献需遵守相关版权规定

论文标题