意外学习者：多语言自我监督模型中的口语识别

论文标题

意外学习者：多语言自我监督模型中的口语识别

Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models

论文作者

Bartley, Travis M., Jia, Fei, Puvvada, Krishna C., Kriman, Samuel, Ginsburg, Boris

论文摘要

在本文中，我们通过在多语种预训练范式中对基于构象体的体系结构进行实验，扩展了以前的自我监督方法来识别语言识别方法。我们发现，预先训练的语音模型在下层中最佳编码语言歧视性信息。此外，我们证明了从这些层获得的嵌入物非常健壮，可以在没有其他训练的情况下对看不见的语言和不同的声学环境进行分类。在对Voxlingua107数据集上的预训练的构象模型微调后，我们获得了类似于语言识别的当前最新系统的结果。更重要的是，我们的模型以减少5倍的参数来实现这一目标。我们通过NVIDIA NEMO工具包开源模型。

In this paper, we extend previous self-supervised approaches for language identification by experimenting with Conformer based architecture in a multilingual pre-training paradigm. We find that pre-trained speech models optimally encode language discriminatory information in lower layers. Further, we demonstrate that the embeddings obtained from these layers are significantly robust to classify unseen languages and different acoustic environments without additional training. After fine-tuning a pre-trained Conformer model on the VoxLingua107 dataset, we achieve results similar to current state-of-the-art systems for language identification. More, our model accomplishes this with 5x less parameters. We open-source the model through the NVIDIA NeMo toolkit.

下载PDF全文

下载文献需遵守相关版权规定

论文标题