探索预训练的检查点在文本到音乐生成任务中的功效

论文标题

探索预训练的检查点在文本到音乐生成任务中的功效

Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task

论文作者

Wu, Shangda, Sun, Maosong

论文摘要

受益于大规模数据集和预训练的模型，生成模型的领域最近获得了巨大的动力。但是，大多数符号音乐数据集都很小，这可能会限制数据驱动的多模式模型的性能。解决此问题的直观解决方案是利用其他模式（例如自然语言）的预训练模型来提高与符号音乐相关的多模式任务的性能。在本文中，我们进行了首次研究，该研究是从文本描述中生成完整和语义上一致的符号音乐得分，并探讨使用公开可用的检查点（即Bert，GPT-2和BART）进行自然语言处理文本对文本到音乐的任务的功效。我们的实验结果表明，在BLEU评分和编辑距离相似性方面，使用预训练的检查点的改进在统计学上具有重要意义。我们分析了模型的功能和局限性，以更好地了解语言模型的潜力。

Benefiting from large-scale datasets and pre-trained models, the field of generative models has recently gained significant momentum. However, most datasets for symbolic music are very small, which potentially limits the performance of data-driven multimodal models. An intuitive solution to this problem is to leverage pre-trained models from other modalities (e.g., natural language) to improve the performance of symbolic music-related multimodal tasks. In this paper, we carry out the first study of generating complete and semantically consistent symbolic music scores from text descriptions, and explore the efficacy of using publicly available checkpoints (i.e., BERT, GPT-2, and BART) for natural language processing in the task of text-to-music generation. Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity. We analyse the capabilities and limitations of our model to better understand the potential of language-music models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题