稳健自动语音识别的时间域语音增强

论文标题

稳健自动语音识别的时间域语音增强

Time-Domain Speech Enhancement for Robust Automatic Speech Recognition

论文作者

Yang, Yufeng, Pandey, Ashutosh, Wang, DeLiang

论文摘要

已经表明，通过语音增强算法可以提高嘈杂语音的清晰度。但是，与直接在嘈杂的语音上训练的ASR模型相比，在嘈杂条件下，在嘈杂条件下，尚未确定语音增强作为有效的自动语音识别（ASR）的有效前端。言语增强和ASR之间的鸿沟阻碍了稳健的ASR系统的进步，尤其是随着近年来语音增强的发展取得了长足的进步。在这项工作中，我们专注于基于ARN（细心的经常性网络）基于时间域增强模型来消除这种鸿沟。提出的系统完全将语音增强和仅在干净的语音上进行训练的声学模型。 Chime-2语料库的结果表明，ARN增强的语音转化为改善的ASR结果。所提出的系统达到$ 6.28 \％$平均单词错误率，相对较高的$ 19.3 \％$的表现优于上一张最佳。

It has been shown that the intelligibility of noisy speech can be improved by speech enhancement algorithms. However, speech enhancement has not been established as an effective frontend for robust automatic speech recognition (ASR) in noisy conditions compared to an ASR model trained on noisy speech directly. The divide between speech enhancement and ASR impedes the progress of robust ASR systems especially as speech enhancement has made big strides in recent years. In this work, we focus on eliminating this divide with an ARN (attentive recurrent network) based time-domain enhancement model. The proposed system fully decouples speech enhancement and an acoustic model trained only on clean speech. Results on the CHiME-2 corpus show that ARN enhanced speech translates to improved ASR results. The proposed system achieves $6.28\%$ average word error rate, outperforming the previous best by $19.3\%$ relatively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题