多任务语言建模用于改善稀有词的语音识别

论文标题

多任务语言建模用于改善稀有词的语音识别

Multi-task Language Modeling for Improving Speech Recognition of Rare Words

论文作者

Yang, Chao-Han Huck, Liu, Linda, Gandhe, Ankur, Gu, Yile, Raju, Anirudh, Filimonov, Denis, Bulyko, Ivan

论文摘要

端到端的自动语音识别（ASR）系统由于其相对的架构简单性和竞争性能而越来越受欢迎。但是，即使这些系统的平均准确性可能很高，但稀有内容词的性能通常落后于混合ASR系统。为了解决这个问题，经常在语言建模上利用第二频繁的纠正。在本文中，我们提出了一个具有多任务学习的第二届通系统，利用语义目标（例如意图和插槽预测）来提高语音识别性能。我们表明，经过这些其他任务训练的撤退模型优于基线撤退模型，仅接受语言建模任务训练，在一般测试中训练1.4％，在稀有单词测试集上，就单词误差率（WERR）而言，稀有单词测试设置。与仅RNN换能器ASR基线相比，我们使用多任务LM的最佳ASR系统显示出4.6％的WERR扣除额，以识别稀有单词。

End-to-end automatic speech recognition (ASR) systems are increasingly popular due to their relative architectural simplicity and competitive performance. However, even though the average accuracy of these systems may be high, the performance on rare content words often lags behind hybrid ASR systems. To address this problem, second-pass rescoring is often applied leveraging upon language modeling. In this paper, we propose a second-pass system with multi-task learning, utilizing semantic targets (such as intent and slot prediction) to improve speech recognition performance. We show that our rescoring model trained with these additional tasks outperforms the baseline rescoring model, trained with only the language modeling task, by 1.4% on a general test and by 2.6% on a rare word test set in terms of word-error-rate relative (WERR). Our best ASR system with multi-task LM shows 4.6% WERR deduction compared with RNN Transducer only ASR baseline for rare words recognition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题