衍射：基于扩散的生成音乐转录，具有无监督的预处理能力

论文标题

衍射：基于扩散的生成音乐转录，具有无监督的预处理能力

DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability

论文作者

Cheuk, Kin Wai, Sawata, Ryosuke, Uesaka, Toshimitsu, Murata, Naoki, Takahashi, Naoya, Takahashi, Shusuke, Herremans, Dorien, Mitsufuji, Yuki

论文摘要

在本文中，我们提出了一种新颖的生成方法，即衍射，以应对自动音乐转录（AMT）。与其将AMT视为一项歧视任务，在该任务中，训练模型将频谱图转换为钢琴卷，而是将其视为有条件的生成任务，在该任务中，我们训练模型从纯高斯噪声中生成逼真的外观钢琴滚动，以光谱图为条件。这种新的AMT公式使衍射能够抄录，生成甚至涂漆音乐。由于不含分类器的性质，还可以在只有钢琴卷的未配对数据集上进行训练。我们的实验表明，衍射的表现优于其歧视性对应于19个百分点（ppt。），而我们的消融研究也表明，它的表现优于4.8 ppt的现有方法。源代码和演示可用https://sony.github.io/diffroll/。

In this paper we propose a novel generative approach, DiffRoll, to tackle automatic music transcription (AMT). Instead of treating AMT as a discriminative task in which the model is trained to convert spectrograms into piano rolls, we think of it as a conditional generative task where we train our model to generate realistic looking piano rolls from pure Gaussian noise conditioned on spectrograms. This new AMT formulation enables DiffRoll to transcribe, generate and even inpaint music. Due to the classifier-free nature, DiffRoll is also able to be trained on unpaired datasets where only piano rolls are available. Our experiments show that DiffRoll outperforms its discriminative counterpart by 19 percentage points (ppt.) and our ablation studies also indicate that it outperforms similar existing methods by 4.8 ppt. Source code and demonstration are available https://sony.github.io/DiffRoll/.

下载PDF全文

下载文献需遵守相关版权规定

论文标题