论文标题
Cmelgan:基于MEL光谱图的有效条件生成模型
cMelGAN: An Efficient Conditional Generative Model Based on Mel Spectrograms
论文作者
论文摘要
在机器学习领域分析音乐是一个非常困难的问题,需要考虑许多限制。音频数据的性质具有很高的维度,结构的尺度很大,是为什么很难建模的主要原因之一。机器学习在音乐中有许多应用,例如将音乐,有条件的音乐发电或受欢迎程度的预测进行分类。该项目的目标是基于MEL谱图开发音乐的音乐生成模型,并通过将其与使用基于音符表示的现有生成音乐模型进行比较来评估其性能。我们最初实施了一种称为Melnet的自动回归,基于RNN的生成模型。但是,由于其缓慢的速度和低的保真度输出,我们决定创建一种基于梅尔根[4]和有条件的GAN体系结构的新的,完全卷积的体系结构,称为Cmelgan。
Analysing music in the field of machine learning is a very difficult problem with numerous constraints to consider. The nature of audio data, with its very high dimensionality and widely varying scales of structure, is one of the primary reasons why it is so difficult to model. There are many applications of machine learning in music, like the classifying the mood of a piece of music, conditional music generation, or popularity prediction. The goal for this project was to develop a genre-conditional generative model of music based on Mel spectrograms and evaluate its performance by comparing it to existing generative music models that use note-based representations. We initially implemented an autoregressive, RNN-based generative model called MelNet . However, due to its slow speed and low fidelity output, we decided to create a new, fully convolutional architecture that is based on the MelGAN [4] and conditional GAN architectures, called cMelGAN.