论文标题

具有CTC增强解码器输入的非自动回应变压器ASR

Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input

论文作者

Song, Xingchen, Wu, Zhiyong, Huang, Yiheng, Weng, Chao, Su, Dan, Meng, Helen

论文摘要

非自动回传(NAR)变压器模型已显着推理速度,但与自动语音识别(ASR)中自回归(AR)模型相比,其准确性较低。大多数NAR变形金刚采用固定长度序列,填充了掩模令牌或从编码态作为解码器输入复制的冗余序列,它们无法提供有效的目标端信息,从而导致准确性降解。为了解决此问题,我们提出了一个CTC增强的NAR变压器,该变压器通过完善CTC模块的预测来生成目标序列。实验结果表明,我们的方法比以前所有的NAR对应物都比在Aishell-1和Aishell-2数据集上仅0.0〜0.3绝对CER降解的强AR基线比强大的AR基线要快50倍。

Non-autoregressive (NAR) transformer models have achieved significantly inference speedup but at the cost of inferior accuracy compared to autoregressive (AR) models in automatic speech recognition (ASR). Most of the NAR transformers take a fixed-length sequence filled with MASK tokens or a redundant sequence copied from encoder states as decoder input, they cannot provide efficient target-side information thus leading to accuracy degradation. To address this problem, we propose a CTC-enhanced NAR transformer, which generates target sequence by refining predictions of the CTC module. Experimental results show that our method outperforms all previous NAR counterparts and achieves 50x faster decoding speed than a strong AR baseline with only 0.0 ~ 0.3 absolute CER degradation on Aishell-1 and Aishell-2 datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源