用于多语言代码开关语言的ASR系统的半监督开发，用于资源不足的语言

论文标题

用于多语言代码开关语言的ASR系统的半监督开发，用于资源不足的语言

Semi-supervised Development of ASR Systems for Multilingual Code-switched Speech in Under-resourced Languages

论文作者

Biswas, Astik, Yılmaz, Emre, de Wet, Febe, van der Westhuizen, Ewald, Niesler, Thomas

论文摘要

本文报告了有关五种南非语言的资源不足，代码开关的语音的半监督和语言模型的半监督开发。考虑了两种方法。第一个构造了四个单独的双语自动语音识别器（ASR），对应于扬声器之间经常切换的四个不同语言对。第二种使用一个代表所有语言（英语，Isizulu，Isixhosa，setswana和sesotho）的单个，统一的五种语法系统。我们评估了这两种方法的有效性，用于在我们极稀疏的训练集中添加其他数据。结果表明，与非批处理方法相比，批量半监督训练的结果更好。此外，虽然单独的双语系统比统一系统获得了更好的识别性能，但它们从五种语系统产生的伪标签中受益于五种语言系统，而不是从双语系统生成的伪系统。

This paper reports on the semi-supervised development of acoustic and language models for under-resourced, code-switched speech in five South African languages. Two approaches are considered. The first constructs four separate bilingual automatic speech recognisers (ASRs) corresponding to four different language pairs between which speakers switch frequently. The second uses a single, unified, five-lingual ASR system that represents all the languages (English, isiZulu, isiXhosa, Setswana and Sesotho). We evaluate the effectiveness of these two approaches when used to add additional data to our extremely sparse training sets. Results indicate that batch-wise semi-supervised training yields better results than a non-batch-wise approach. Furthermore, while the separate bilingual systems achieved better recognition performance than the unified system, they benefited more from pseudo-labels generated by the five-lingual system than from those generated by the bilingual systems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题