异构目标语音分离

论文标题

异构目标语音分离

Heterogeneous Target Speech Separation

论文作者

Tzinis, Efthymios, Wichern, Gordon, Subramanian, Aswin, Smaragdis, Paris, Roux, Jonathan Le

论文摘要

我们引入了一种新的范式，用于单渠道目标源分离，其中可以使用非纯正排斥概念（例如响度，性别，语言，空间位置等）区分感兴趣的来源。我们提出的异质分离框架可以无缝地利用具有较大分配变化的数据集，并在用作条件的各种概念下学习跨域表示。我们的实验表明，具有异质条件的训练分离模型有助于使用看不见的域外数据对新概念的概括，同时也执行高于单域专家模型。值得注意的是，这样的培训可以更强大地学习更艰难的源分离歧视性概念，并可以通过Oracle源选择对置换不变培训进行改进。我们通过异质元数据分析了源分离训练的内在行为，并提出了减轻带有挑战性分离条件的出现问题的方法。我们发布了用于进一步促进这项具有挑战性的任务的所有数据集的准备食谱的收集。

We introduce a new paradigm for single-channel target source separation where the sources of interest can be distinguished using non-mutually exclusive concepts (e.g., loudness, gender, language, spatial location, etc). Our proposed heterogeneous separation framework can seamlessly leverage datasets with large distribution shifts and learn cross-domain representations under a variety of concepts used as conditioning. Our experiments show that training separation models with heterogeneous conditions facilitates the generalization to new concepts with unseen out-of-domain data while also performing substantially higher than single-domain specialist models. Notably, such training leads to more robust learning of new harder source separation discriminative concepts and can yield improvements over permutation invariant training with oracle source selection. We analyze the intrinsic behavior of source separation training with heterogeneous metadata and propose ways to alleviate emerging problems with challenging separation conditions. We release the collection of preparation recipes for all datasets used to further promote research towards this challenging task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题