预测分数分布以提高非侵入性语音质量估计

论文标题

预测分数分布以提高非侵入性语音质量估计

Predicting score distribution to improve non-intrusive speech quality estimation

论文作者

Faridee, Abu Zaher Md, Gamper, Hannes

论文摘要

深噪声抑制器（DNS）已成为消除背景噪音，混响和语音扭曲的有吸引力的解决方案，并广泛用于电话/语音应用中。他们偶尔也很容易引入文物并降低演讲的感知质量。使用多个人类法官得出平均意见分数（MOS）的主观听力测试是衡量这些模型表现的流行方式。基于深度神经网络的非侵入性MOS估计模型最近已成为这些测试的流行成本效益替代品。这些模型仅使用MOS标签进行培训，通常会丢弃意见分数的次要统计数据。在本文中，我们研究了几种方法来整合意见分数的分布（例如差异，直方图信息），以提高MOS估计性能。我们的模型通过320种不同的DNS模型和模型变化对419K DeNo样品的语料库进行了训练，并对来自DNSMOS的18K测试样品进行了评估。我们表明，通过对单个任务MOS估计管道的修改非常小，这些可自由使用的标签可以提供高达0.016 RMSE和1％的SRCC改进。

Deep noise suppressors (DNS) have become an attractive solution to remove background noise, reverberation, and distortions from speech and are widely used in telephony/voice applications. They are also occasionally prone to introducing artifacts and lowering the perceptual quality of the speech. Subjective listening tests that use multiple human judges to derive a mean opinion score (MOS) are a popular way to measure these models' performance. Deep neural network based non-intrusive MOS estimation models have recently emerged as a popular cost-efficient alternative to these tests. These models are trained with only the MOS labels, often discarding the secondary statistics of the opinion scores. In this paper, we investigate several ways to integrate the distribution of opinion scores (e.g. variance, histogram information) to improve the MOS estimation performance. Our model is trained on a corpus of 419K denoised samples by 320 different DNS models and model variations and evaluated on 18K test samples from DNSMOS. We show that with very minor modification of a single task MOS estimation pipeline, these freely available labels can provide up to a 0.016 RMSE and 1% SRCC improvement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题