确定来源分离的替代源模型学习

论文标题

确定来源分离的替代源模型学习

Surrogate Source Model Learning for Determined Source Separation

论文作者

Scheibler, Robin, Togami, Masahito

论文摘要

我们建议学习普遍语音先验的代孕功能，以确保盲目的语音分离。由于其高建模能力，深厚的语音先验是非常可取的，但与基于大型最小化的最新独立矢量分析不兼容，因为得出所需的替代功能并不容易，也不总是可能。取而代之的是，我们取消了精确的大规模化并直接近似替代物。利用迭代源转向（ISS）更新，我们通过多次迭代的多次迭代来支持置换不变的分离损失。由于其复杂性较低和矩阵反转，因此ISS可以很好地执行此任务。与基线方法相比，实验在规模不变的信噪比（SDR）比和单词错误率方面显示出很大的改善。训练是在两种扬声器混合物上进行的，我们试验了两个损失，即SDR和连贯性。我们发现，所学的近似替代物可以很好地概括在三位和四个说话者的混合物上，而没有任何修改。我们还证明了对Auxiva更新方程的不同变化的概括。 SDR损失导致迭代中最快的收敛性，而连贯性导致最低单词错误率（WER）。我们的WER降低了36％。

We propose to learn surrogate functions of universal speech priors for determined blind speech separation. Deep speech priors are highly desirable due to their high modelling power, but are not compatible with state-of-the-art independent vector analysis based on majorization-minimization (AuxIVA), since deriving the required surrogate function is not easy, nor always possible. Instead, we do away with exact majorization and directly approximate the surrogate. Taking advantage of iterative source steering (ISS) updates, we back propagate the permutation invariant separation loss through multiple iterations of AuxIVA. ISS lends itself well to this task due to its lower complexity and lack of matrix inversion. Experiments show large improvements in terms of scale invariant signal-to-distortion (SDR) ratio and word error rate compared to baseline methods. Training is done on two speakers mixtures and we experiment with two losses, SDR and coherence. We find that the learnt approximate surrogate generalizes well on mixtures of three and four speakers without any modification. We also demonstrate generalization to a different variation of the AuxIVA update equations. The SDR loss leads to fastest convergence in iterations, while coherence leads to the lowest word error rate (WER). We obtain as much as 36 % reduction in WER.

下载PDF全文

下载文献需遵守相关版权规定

论文标题