Musicoder：基于变形金刚的通用音乐声音编码器

论文标题

Musicoder：基于变形金刚的通用音乐声音编码器

MusiCoder: A Universal Music-Acoustic Encoder Based on Transformers

论文作者

Zhao, Yilun, Guo, Jia

论文摘要

音乐注释一直是音乐信息检索领域的关键主题之一（MIR）。传统模型将监督学习用于音乐注释任务。但是，随着监督机器学习方法的复杂性的增加，对更多带注释的培训数据的需求日益与可用数据相匹配。在本文中，提出了一种名为Musicoder的新的自我监督音乐表达式学习方法。受Bert成功的启发，Musicoder建立在自发双向变压器的架构基础上。两个预训练的目标，包括连续框架掩盖（CFM）和连续的通道掩蔽（CCM），旨在调整类似Bert的Bert样掩盖重建预训练，以适应连续的声学框架域。在两个下游音乐注释任务中评估音乐编码器的性能。结果表明，Musicoder在音乐类型分类和自动标记任务中都优于最先进的模型。音乐编码器的有效性表明了一种新的自我监督学习方法来理解音乐的巨大潜力：首先应用蒙版的重建任务，以预先培训基于变压器的模型，该模型具有大量未标记的音乐声学数据，然后对带有标记数据的特定下游任务进行Finetune。

Music annotation has always been one of the critical topics in the field of Music Information Retrieval (MIR). Traditional models use supervised learning for music annotation tasks. However, as supervised machine learning approaches increase in complexity, the increasing need for more annotated training data can often not be matched with available data. In this paper, a new self-supervised music acoustic representation learning approach named MusiCoder is proposed. Inspired by the success of BERT, MusiCoder builds upon the architecture of self-attention bidirectional transformers. Two pre-training objectives, including Contiguous Frames Masking (CFM) and Contiguous Channels Masking (CCM), are designed to adapt BERT-like masked reconstruction pre-training to continuous acoustic frame domain. The performance of MusiCoder is evaluated in two downstream music annotation tasks. The results show that MusiCoder outperforms the state-of-the-art models in both music genre classification and auto-tagging tasks. The effectiveness of MusiCoder indicates a great potential of a new self-supervised learning approach to understand music: first apply masked reconstruction tasks to pre-train a transformer-based model with massive unlabeled music acoustic data, and then finetune the model on specific downstream tasks with labeled data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题