使用扩散模型的逼真的搜枪噪声合成

论文标题

使用扩散模型的逼真的搜枪噪声合成

Realistic Gramophone Noise Synthesis using a Diffusion Model

论文作者

Moliner, Eloi, Välimäki, Vesa

论文摘要

本文介绍了一种新型的数据驱动策略，用于综合留声机噪声音频纹理。扩散概率模型被应用于产生高度逼真的准静脉噪声。所提出的模型旨在生成等于一个磁盘革命的长度样本，但是还提出了一种在旋转之间产生合理的周期性变化的方法。还采用了一种指导方法作为调节方法，其中通过反向扩散来完善具有手动调整信号处理的音频信号以改善现实主义。该方法已在主观的听力测试中进行了评估，其中参与者通常无法识别来自真实的信号的合成信号。用最佳提议的无条件方法产生的合成噪声在统计学上与真实噪声记录无法区分。这项工作显示了扩散模型对高度逼真的音频综合任务的潜力。

This paper introduces a novel data-driven strategy for synthesizing gramophone noise audio textures. A diffusion probabilistic model is applied to generate highly realistic quasiperiodic noises. The proposed model is designed to generate samples of length equal to one disk revolution, but a method to generate plausible periodic variations between revolutions is also proposed. A guided approach is also applied as a conditioning method, where an audio signal generated with manually-tuned signal processing is refined via reverse diffusion to improve realism. The method has been evaluated in a subjective listening test, in which the participants were often unable to recognize the synthesized signals from the real ones. The synthetic noises produced with the best proposed unconditional method are statistically indistinguishable from real noise recordings. This work shows the potential of diffusion models for highly realistic audio synthesis tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题