残留引导的非侵入性语音质量评估

论文标题

残留引导的非侵入性语音质量评估

Residual-Guided Non-Intrusive Speech Quality Assessment

论文作者

Ye, Zhe, Chen, Jiahao, Yan, Diqun

论文摘要

本文提出了一种基于言语受损和言语增强之间的残差来改善非侵入性语音质量评估（NI-SQA）的方法。我们任务的困难尤其缺乏信息，因为这些信息不存在相应的参考语音。我们对障碍的语音产生了增强的语音，以弥补缺乏参考音频，然后将残留物的信息与不受欢迎的语音相结合。与将受损的语音直接喂入模型相比，残差可以从增强的对比中带来一些额外的有用信息。人耳对某些噪音敏感，但与深度学习模型不同。引起平均意见评分（MOS），模型预测的不足以适合我们的主观敏感良好并导致偏差。这些残差与参考语音有着密切的关系，然后提高了深度学习模型预测MOS的能力。在训练阶段，实验结果表明，与残差配对可以在相同条件下迅速获得更好的评估指标。此外，在PLCC和RMSE中，我们的最终结果分别提高了31.3％和14.1％。

This paper proposes an approach to improve Non-Intrusive speech quality assessment(NI-SQA) based on the residuals between impaired speech and enhanced speech. The difficulty in our task is particularly lack of information, for which the corresponding reference speech is absent. We generate an enhanced speech on the impaired speech to compensate for the absence of the reference audio, then pair the information of residuals with the impaired speech. Compared to feeding the impaired speech directly into the model, residuals could bring some extra helpful information from the contrast in enhancement. The human ear is sensitive to certain noises but different to deep learning model. Causing the Mean Opinion Score(MOS) the model predicted is not enough to fit our subjective sensitive well and causes deviation. These residuals have a close relationship to reference speech and then improve the ability of the deep learning models to predict MOS. During the training phase, experimental results demonstrate that paired with residuals can quickly obtain better evaluation indicators under the same conditions. Furthermore, our final results improved 31.3 percent and 14.1 percent, respectively, in PLCC and RMSE.

下载PDF全文

下载文献需遵守相关版权规定

论文标题