对自动语音识别的多语言模型的调查

论文标题

对自动语音识别的多语言模型的调查

A Survey of Multilingual Models for Automatic Speech Recognition

论文作者

Yadav, Hemant, Sitaram, Sunayana

论文摘要

尽管自动语音识别（ASR）系统已经在几种语言中实现了类似人类的性能，但由于缺乏大型语音数据集来训练这些模型，世界上大多数语言都没有可用的系统。跨语性转移是解决此问题的有吸引力的解决方案，因为低资源语言可以通过转移学习或在同一多语言模型中共同培训的高资源语言受益。但是，在ASR中对跨语性转移的问题进行了很好的研究，但是，自我监督学习的最新进展正在为多语言ASR模型中使用未标记的语音数据开放，这可以为改善低资源语言的性能铺平道路。在本文中，我们在多语言ASR模型中调查了最新的ASR模型的状态，这些模型考虑了跨语义转移。我们介绍了从不同语言和技术的研究中构建多语言模型的最佳实践，讨论开放问题并为将来的工作提供建议。

Although Automatic Speech Recognition (ASR) systems have achieved human-like performance for a few languages, the majority of the world's languages do not have usable systems due to the lack of large speech datasets to train these models. Cross-lingual transfer is an attractive solution to this problem, because low-resource languages can potentially benefit from higher-resource languages either through transfer learning, or being jointly trained in the same multilingual model. The problem of cross-lingual transfer has been well studied in ASR, however, recent advances in Self Supervised Learning are opening up avenues for unlabeled speech data to be used in multilingual ASR models, which can pave the way for improved performance on low-resource languages. In this paper, we survey the state of the art in multilingual ASR models that are built with cross-lingual transfer in mind. We present best practices for building multilingual models from research across diverse languages and techniques, discuss open questions and provide recommendations for future work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题