自我训练和预训练是语音识别的补充

论文标题

自我训练和预训练是语音识别的补充

Self-training and Pre-training are Complementary for Speech Recognition

论文作者

Xu, Qiantong, Baevski, Alexei, Likhomanenko, Tatiana, Tomasello, Paden, Conneau, Alexis, Collobert, Ronan, Synnaeve, Gabriel, Auli, Michael

论文摘要

自我训练和无监督的预训练已成为使用未标记数据改善语音识别系统的有效方法。但是，尚不清楚他们是否学习类似的模式或是否可以有效地组合。在本文中，我们表明使用WAV2VEC 2.0进行伪标记和预训练是在各种标记的数据设置中互补的。在清洁和其他测试集的LiblisPeech中，使用Libri -Light的标签数据仅10分钟，以及53K小时的未标记数据，可在清洁和其他测试集上获得3.0％/5.2％的数据 - 与一年前960个小时的标记数据进行了培训的最佳发布的系统。对所有标记的LibrisPeech数据的培训均为1.5％/3.1％的培训。

Self-training and unsupervised pre-training have emerged as effective approaches to improve speech recognition systems using unlabeled data. However, it is not clear whether they learn similar patterns or if they can be effectively combined. In this paper, we show that pseudo-labeling and pre-training with wav2vec 2.0 are complementary in a variety of labeled data setups. Using just 10 minutes of labeled data from Libri-light as well as 53k hours of unlabeled data from LibriVox achieves WERs of 3.0%/5.2% on the clean and other test sets of Librispeech - rivaling the best published systems trained on 960 hours of labeled data only a year ago. Training on all labeled data of Librispeech achieves WERs of 1.5%/3.1%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题