基于扩散的生成模型，无监督的人声覆盖

论文标题

基于扩散的生成模型，无监督的人声覆盖

Unsupervised vocal dereverberation with diffusion-based generative models

论文作者

Saito, Koichi, Murata, Naoki, Uesaka, Toshimitsu, Lai, Chieh-Hsin, Takida, Yuhta, Fukui, Takao, Mitsufuji, Yuki

论文摘要

从混响音乐中取出混响是清理下游音乐操作的音频的必要技术。音乐的混响包含两个类别，自然混响和人造混响。人工混响的多样性比自然混响，由于其各种参数设置和混响类型。但是，最近有监督的过脊椎方法可能会失败，因为它们依靠足够多样化的和大量的回响观察结果并检索了用于培训的数据，以便在推理过程中可以概括地看不见观察。为了解决这些问题，我们提出了一种无监督的方法，该方法可以消除音乐的一般人造混响，而无需进行培训成对。提出的方法基于扩散模型，在该模型中，它使用常规信号处理技术初始化了未知的混响操作员，并借助扩散模型同时完善了估计值。我们通过客观和感知评估表明，我们的方法的表现优于当前领先的人声覆盖基准。

Removing reverb from reverberant music is a necessary technique to clean up audio for downstream music manipulations. Reverberation of music contains two categories, natural reverb, and artificial reverb. Artificial reverb has a wider diversity than natural reverb due to its various parameter setups and reverberation types. However, recent supervised dereverberation methods may fail because they rely on sufficiently diverse and numerous pairs of reverberant observations and retrieved data for training in order to be generalizable to unseen observations during inference. To resolve these problems, we propose an unsupervised method that can remove a general kind of artificial reverb for music without requiring pairs of data for training. The proposed method is based on diffusion models, where it initializes the unknown reverberation operator with a conventional signal processing technique and simultaneously refines the estimate with the help of diffusion models. We show through objective and perceptual evaluations that our method outperforms the current leading vocal dereverberation benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题