代码切换和代码识别指示语言的语音识别

论文标题

代码切换和代码识别指示语言的语音识别

Code Switched and Code Mixed Speech Recognition for Indic languages

论文作者

Chadha, Harveen Singh, Shah, Priyanshi, Dhuriya, Ankur, Chhimwal, Neeraj, Gupta, Anirudh, Raghavan, Vivek

论文摘要

培训多语言自动语音识别（ASR）系统具有挑战性，因为声学和词汇信息通常是特定于语言的。由于缺乏开源数据集和不同方法的结果，培训对指示语言的多语言系统更加困难。我们将端到端多语言语音识别系统的性能与以语言识别为条件（LID）的单语模型的性能进行比较。来自多语言模型的解码信息用于语言识别，然后与单语模型结合使用，以改善跨语言的50％。我们还提出了一种类似的技术来解决代码切换问题，并分别超过了印度英语和孟加拉国英语的21.77和28.27。我们的工作谈到了如何将基于变压器的ASR尤其是WAV2VEC 2.0应用于开发用于指示语言的多语言ASR和代码转换ASR。

Training multilingual automatic speech recognition (ASR) systems is challenging because acoustic and lexical information is typically language specific. Training multilingual system for Indic languages is even more tougher due to lack of open source datasets and results on different approaches. We compare the performance of end to end multilingual speech recognition system to the performance of monolingual models conditioned on language identification (LID). The decoding information from a multilingual model is used for language identification and then combined with monolingual models to get an improvement of 50% WER across languages. We also propose a similar technique to solve the Code Switched problem and achieve a WER of 21.77 and 28.27 over Hindi-English and Bengali-English respectively. Our work talks on how transformer based ASR especially wav2vec 2.0 can be applied in developing multilingual ASR and code switched ASR for Indic languages.

下载PDF全文

下载文献需遵守相关版权规定

论文标题