论文标题

用于自动发音错误检测的文本条件变压器

Text-Conditioned Transformer for Automatic Pronunciation Error Detection

论文作者

Zhang, Zhan, Wang, Yuehai, Yang, Jianyi

论文摘要

自动发音错误检测(APED)在语言学习领域中起重要作用。至于先前的基于ASR的APED方法,必须将解码的结果与目标文本对齐,以便可以找到错误。但是,由于解码过程和对齐过程是独立的,因此对目标文本的先验知识尚未完全利用。在本文中,我们建议将目标文本用作变压器主链处理APED任务的额外条件。提出的方法可以以完全端到端的方式考虑输入语音和目标文本之间的关系,以输入错误状态。同时,由于先前的目标文本被用作解码器输入的条件,因此变压器以进率的方式工作,而不是在推理阶段进行自动锻炼,这可以显着促进实际部署的速度。我们将基于ASR的变压器设置为基线APED模型,并在L2-极数据集上进行多个实验。结果表明,我们的方法可以在$ f_1 $得分度量上获得8.4 \%的相对改进。

Automatic pronunciation error detection (APED) plays an important role in the domain of language learning. As for the previous ASR-based APED methods, the decoded results need to be aligned with the target text so that the errors can be found out. However, since the decoding process and the alignment process are independent, the prior knowledge about the target text is not fully utilized. In this paper, we propose to use the target text as an extra condition for the Transformer backbone to handle the APED task. The proposed method can output the error states with consideration of the relationship between the input speech and the target text in a fully end-to-end fashion.Meanwhile, as the prior target text is used as a condition for the decoder input, the Transformer works in a feed-forward manner instead of autoregressive in the inference stage, which can significantly boost the speed in the actual deployment. We set the ASR-based Transformer as the baseline APED model and conduct several experiments on the L2-Arctic dataset. The results demonstrate that our approach can obtain 8.4\% relative improvement on the $F_1$ score metric.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源