论文标题

知识蒸馏应用于多任务语音表示学习

Application of Knowledge Distillation to Multi-task Speech Representation Learning

论文作者

Kerpicci, Mine, Nguyen, Van, Zhang, Shuhua, Visser, Erik

论文摘要

已提出了诸如WAV2VEC 2.0和Hubert之类的模型体系结构以自我监督的方式从音频波形中学习语音表示。当它们与下游任务(例如关键字发现和扬声器验证)结合使用时,它们会提供最先进的性能。但是,这些模型使用大量参数,其最小版本具有9500万个参数。这构成了边缘AI设备部署的挑战。在本文中,我们研究了知识蒸馏到语音表示学习(SRL)模型的应用,然后与多个下游语音激活任务进行了微调。在我们对两个这样的任务的实验中,与全尺寸型号相比,我们的方法降低了近75%,而仅遭受0.1%的精度和0.9%的误差率降解。此外,我们表明,与使用冷冻的SRL模型相比,对SRL模型进行微调会产生显着的性能。

Model architectures such as wav2vec 2.0 and HuBERT have been proposed to learn speech representations from audio waveforms in a self-supervised manner. When they are combined with downstream tasks such as keyword spotting and speaker verification, they provide state-of-the-art performance. However, these models use a large number of parameters, the smallest version of which has 95 million parameters. This constitutes a challenge for edge AI device deployments. In this paper, we investigate the application of knowledge distillation to speech representation learning (SRL) models followed by joint fine-tuning with multiple downstream voice-activated tasks. In our experiments on two such tasks, our approach results in nearly 75% reduction in model size while suffering only 0.1% accuracy and 0.9% equal error rate degradation compared to the full-size model. In addition, we show that fine-tuning the SRL models results in a significant performance boost compared to using frozen SRL models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源