通过元学习提高语音分离的可转移性

论文标题

通过元学习提高语音分离的可转移性

Improving the transferability of speech separation by meta-learning

论文作者

Huang, Kuan-Po, Wu, Yuan-Kuei, Lee, Hung-yi

论文摘要

语音分离旨在将多种语音源与语音混合物分开。尽管语音分离在某些现有的英语语音分离基准上得到了很好的解决，但值得对培训期间语音分离模型的普遍性进行更多调查。本文采用基于元学习的方法来提高语音分离模型的可传递性。使用基于元学习的方法，我们发现只有用一种口音使用语音数据（作为本机英语口音）作为我们的训练数据，这些模型仍然可以适应语音口音档案中的新看不见的口音。我们将结果与重音的人级的原生味性进行了比较，这表明MAML方法的可传递性与训练和测试阶段之间的数据相似性相比，与典型的转移学习方法相比，其关系较小。此外，我们发现模型可以在测试阶段处理来自CommonVoice语料库的不同语言数据。最重要的是，在新的口音，新的扬声器，新语言和嘈杂的环境方面，MAML方法的表现优于典型的转移学习方法。

Speech separation aims to separate multiple speech sources from a speech mixture. Although speech separation is well-solved on some existing English speech separation benchmarks, it is worthy of more investigation on the generalizability of speech separation models on the accents or languages unseen during training. This paper adopts meta-learning based methods to improve the transferability of speech separation models. With the meta-learning based methods, we discovered that only using speech data with one accent, the native English accent, as our training data, the models still can be adapted to new unseen accents on the Speech Accent Archive. We compared the results with a human-rated native-likeness of accents, showing that the transferability of MAML methods has less relation to the similarity of data between the training and testing phase compared to the typical transfer learning methods. Furthermore, we found that models can deal with different language data from the CommonVoice corpus during the testing phase. Most of all, the MAML methods outperform typical transfer learning methods when it comes to new accents, new speakers, new languages, and noisy environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题