论文标题
改善语言的重音语音识别
Improving Language Identification of Accented Speech
论文作者
论文摘要
语言识别是在许多口头语言处理系统中常见的预处理步骤。近年来,该领域的进步很快,这主要是由于使用了在多语言数据和大型培训语料库中鉴定的自制模型。本文表明,对于以非本地或区域口音的语音,口语识别系统的准确性急剧下降,并且识别语言的准确性与口音的强度成反比。我们还表明,使用特定语言的无词典语音识别系统的输出有助于提高语言识别性能在很大的边距上,而无需牺牲本地语音的准确性。在几个非本地语音数据集中,我们在最先进的模型上获得了相对错误率降低到35%至63%。
Language identification from speech is a common preprocessing step in many spoken language processing systems. In recent years, this field has seen fast progress, mostly due to the use of self-supervised models pretrained on multilingual data and the use of large training corpora. This paper shows that for speech with a non-native or regional accent, the accuracy of spoken language identification systems drops dramatically, and that the accuracy of identifying the language is inversely correlated with the strength of the accent. We also show that using the output of a lexicon-free speech recognition system of the particular language helps to improve language identification performance on accented speech by a large margin, without sacrificing accuracy on native speech. We obtain relative error rate reductions ranging from to 35 to 63% over the state-of-the-art model across several non-native speech datasets.