MT4SSL：通过整合多个目标来增强自我监督的语音表示学习

论文标题

MT4SSL：通过整合多个目标来增强自我监督的语音表示学习

MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets

论文作者

Ma, Ziyang, Zheng, Zhisheng, Tang, Changli, Wang, Yujin, Chen, Xie

论文摘要

在本文中，我们通过如何获得训练目标提供了自我监督的语音模型的新观点。我们将目标提取器推广到离线目标提取器（OFF-TE）和在线目标提取器（ON-TE）中。基于此，我们为自我监督学习的新型学习框架提出了一个新的多任务学习框架MT4SSL，该框架代表着通过整合多个目标来促进自我监督的语音表示学习。 MT4SSL分别将K-Means算法用作OFF-TE和教师网络，而没有渐变作为ON-TE。我们的模型在LibrisPeech基准上的非平凡边缘优于先前的SSL方法，并且比具有更少数据的表现最佳模型可比甚至更好。此外，我们发现使用OFF-TE和ON-TE都会在训练阶段获得更好的收敛性。凭借有效性和效率，我们认为从我们的角度对自我监督的语音模型进行多任务学习是一种有希望的趋势。

In this paper, we provide a new perspective on self-supervised speech models from how the training targets are obtained. We generalize the targets extractor into Offline Targets Extractor (Off-TE) and Online Targets Extractor (On-TE). Based on this, we propose a new multi-tasking learning framework for self-supervised learning, MT4SSL, which stands for Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets. MT4SSL uses the K-means algorithm as an Off-TE and a teacher network without gradients as an On-TE, respectively. Our model outperforms previous SSL methods by nontrivial margins on the LibriSpeech benchmark, and is comparable to or even better than the best-performing models with fewer data. Furthermore, we find that using both Off-TE and On-TE results in better convergence in the pre-training phase. With both effectiveness and efficiency, we think doing multi-task learning on self-supervised speech models from our perspective is a promising trend.

下载PDF全文

下载文献需遵守相关版权规定

论文标题