论文标题
解开语言和渠道变异性对语音分离网络的影响
Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks
论文作者
论文摘要
由于语音分离的表现非常适合两个说话者完全重叠的语音,因此研究的注意力已转移到处理更现实的场景。但是,由于因素,例如说话者,内容,渠道和环境等因素,训练/测试情况之间的域不匹配仍然是言语分离的严重问题。在现有文献中已经研究了演讲者和环境不匹配。然而,关于语音内容和渠道不匹配的研究很少。此外,这些研究中语言和渠道的影响大多是纠结的。在这项研究中,我们为各种实验创建了几个数据集。结果表明,与不同渠道的影响相比,不同语言的影响足以忽略。在我们的实验中,Android手机记录的数据培训可提高最佳的概括性。此外,我们通过评估投影提供了一种新的解决方案,以测量通道相似性并用于选择其他训练数据以提高野外测试数据的性能。
Because the performance of speech separation is excellent for speech in which two speakers completely overlap, research attention has been shifted to dealing with more realistic scenarios. However, domain mismatch between training/test situations due to factors, such as speaker, content, channel, and environment, remains a severe problem for speech separation. Speaker and environment mismatches have been studied in the existing literature. Nevertheless, there are few studies on speech content and channel mismatches. Moreover, the impacts of language and channel in these studies are mostly tangled. In this study, we create several datasets for various experiments. The results show that the impacts of different languages are small enough to be ignored compared to the impacts of different channels. In our experiments, training on data recorded by Android phones leads to the best generalizability. Moreover, we provide a new solution for channel mismatch by evaluating projection, where the channel similarity can be measured and used to effectively select additional training data to improve the performance of in-the-wild test data.