通过面膜样品解码，声学无助的非自动回归法术校正

论文标题

通过面膜样品解码，声学无助的非自动回归法术校正

Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding

论文作者

Fan, Ruchao, Ye, Guoli, Gaur, Yashesh, Li, Jinyu

论文摘要

蒙版语言模型（MLM）已被广泛用于理解任务，例如伯特。最近，MLM也已用于生成任务。演讲中最受欢迎的是使用Mask-CTC进行非自动回忆的语音识别。在本文中，我们进一步迈出了一步，并探讨了将MLM用作变压器变形器（TT）的非自动回忆式法术校正（SC）模型，称为MLM-SC。我们的最初实验表明，MLM-SC对LibrisPeech数据没有任何改进。问题可能是选择建模单元（文字）和英语数据TT置信度得分的不准确性。为了解决问题，我们提出了一种掩模样品解码（MS-Decode）方法，其中掩盖令牌可以选择被掩盖或不弥补不准确性。结果，我们在liblispeech测试中的数据和AISHELL测试数据的cer上，流式TT的WER从7.6％降低到6.5％。

Masked language model (MLM) has been widely used for understanding tasks, e.g. BERT. Recently, MLM has also been used for generation tasks. The most popular one in speech is using Mask-CTC for non-autoregressive speech recognition. In this paper, we take one step further, and explore the possibility of using MLM as a non-autoregressive spell correction (SC) model for transformer-transducer (TT), denoted as MLM-SC. Our initial experiments show that MLM-SC provides no improvements on Librispeech data. The problem might be the choice of modeling units (word pieces) and the inaccuracy of the TT confidence scores for English data. To solve the problem, we propose a mask sample decoding (MS-decode) method where the masked tokens can have the choice of being masked or not to compensate for the inaccuracy. As a result, we reduce the WER of a streaming TT from 7.6% to 6.5% on the Librispeech test-other data and the CER from 7.3% to 6.1% on the Aishell test data, respectively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题