使用自发卷积神经网络在音乐中的声音和伴奏分离

论文标题

使用自发卷积神经网络在音乐中的声音和伴奏分离

Voice and accompaniment separation in music using self-attention convolutional neural network

论文作者

Liu, Yuzhou, Thoshkahna, Balaji, Milani, Ali, Kristjansson, Trausti

论文摘要

数十年来，音乐源分离一直是信号处理中的流行话题，这不仅是因为它的技术困难，而且还因为它对许多商业应用的重要性，例如自动karoake和混音。在这项工作中，我们提出了一个新颖的自我发挥网络，以分离音乐的声音和伴奏。首先，建立了具有密度连接CNN块的卷积神经网络（CNN）作为我们的基本网络。然后，我们将自发注意的子网插入基本CNN的不同级别，以利用音乐的长期依赖性，即重复。在自我注意的子网中，相同的音乐模式的重复为其他重复的重建提供了重建，以获得更好的源分离性能。结果表明，根据SDR，所提出的方法导致人声分离的相对改善。我们将我们的方法与最先进的系统（即mmdensenet和mmdenselstm）进行了比较。

Music source separation has been a popular topic in signal processing for decades, not only because of its technical difficulty, but also due to its importance to many commercial applications, such as automatic karoake and remixing. In this work, we propose a novel self-attention network to separate voice and accompaniment in music. First, a convolutional neural network (CNN) with densely-connected CNN blocks is built as our base network. We then insert self-attention subnets at different levels of the base CNN to make use of the long-term intra-dependency of music, i.e., repetition. Within self-attention subnets, repetitions of the same musical patterns inform reconstruction of other repetitions, for better source separation performance. Results show the proposed method leads to 19.5% relative improvement in vocals separation in terms of SDR. We compare our methods with state-of-the-art systems i.e. MMDenseNet and MMDenseLSTM.

下载PDF全文

下载文献需遵守相关版权规定

论文标题