论文标题
基于深度学习的源分离应用于合唱团合奏
Deep Learning Based Source Separation Applied To Choir Ensembles
论文作者
论文摘要
合唱演唱是一种广泛实践的合奏唱歌形式,其中一群人同时唱歌。合唱团合奏最常用的设置由四个部分组成。女高音,中音,男高音和贝斯(SATB),每个人都有自己的基本频率范围(f $ 0 $ s)。该合唱设置的源分离的任务需要将SATB混合物分离为组成部分。对音乐混合物的源分离进行了充分的研究,并提出了许多基于深度学习的方法。但是,大多数研究都集中在典型的情况下,该病例包括将声音,打击乐器和低音源与混合物分开,每种都具有独特的光谱结构。相比之下,合奏唱歌的同时和谐波性质导致合唱混合物中源的光谱成分之间的高结构相似性和重叠,从而使合唱团的源分离成为比典型情况更艰难的任务。由于缺乏合适的合并数据集,因此到目前为止,该领域的研究缺乏。在本文中,我们首先评估了一些最近开发的音乐源分离方法对SATB合唱团的表现如何。然后,我们提出了一种新型的域特异性适应,以使用每个歌唱组的基本频率轮廓来调节最近提出的U-NET架构,以进行音乐源分离,并证明我们所提出的方法超过了域 - 敏锐的体系结构的结果。
Choral singing is a widely practiced form of ensemble singing wherein a group of people sing simultaneously in polyphonic harmony. The most commonly practiced setting for choir ensembles consists of four parts; Soprano, Alto, Tenor and Bass (SATB), each with its own range of fundamental frequencies (F$0$s). The task of source separation for this choral setting entails separating the SATB mixture into the constituent parts. Source separation for musical mixtures is well studied and many deep learning based methodologies have been proposed for the same. However, most of the research has been focused on a typical case which consists in separating vocal, percussion and bass sources from a mixture, each of which has a distinct spectral structure. In contrast, the simultaneous and harmonic nature of ensemble singing leads to high structural similarity and overlap between the spectral components of the sources in a choral mixture, making source separation for choirs a harder task than the typical case. This, along with the lack of an appropriate consolidated dataset has led to a dearth of research in the field so far. In this paper we first assess how well some of the recently developed methodologies for musical source separation perform for the case of SATB choirs. We then propose a novel domain-specific adaptation for conditioning the recently proposed U-Net architecture for musical source separation using the fundamental frequency contour of each of the singing groups and demonstrate that our proposed approach surpasses results from domain-agnostic architectures.