iMetricgan：使用基于网络的对抗性的度量学习来提高语音中的语音

论文标题

iMetricgan：使用基于网络的对抗性的度量学习来提高语音中的语音

iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning

论文作者

Li, Haoyu, Fu, Szu-Wei, Tsao, Yu, Yamagishi, Junichi

论文摘要

当暴露于不良嘈杂的环境时，自然语音的清晰度会严重降低。在这项工作中，我们提出了一种基于深度学习的语音修改方法，以补偿可理解性损失，并限制了均方根（RMS）水平和语音信号的持续时间在修改之前和之后保持语音信号的持续时间。具体来说，我们利用一种iMetricgan方法来优化使用生成对抗网络（GAN）的语音清晰度指标。实验结果表明，在自助餐厅噪声条件下，提出的iMetricgan在客观措施（即钻头）（SIIB）（SIIB）（SIIB）（SIIB）和延长的短时客观清晰度（ESTOI）方面优于常规的最新算法。此外，当存在噪音和混响时，正式的听力测试揭示了显着的可理解性。

The intelligibility of natural speech is seriously degraded when exposed to adverse noisy environments. In this work, we propose a deep learning-based speech modification method to compensate for the intelligibility loss, with the constraint that the root mean square (RMS) level and duration of the speech signal are maintained before and after modifications. Specifically, we utilize an iMetricGAN approach to optimize the speech intelligibility metrics with generative adversarial networks (GANs). Experimental results show that the proposed iMetricGAN outperforms conventional state-of-the-art algorithms in terms of objective measures, i.e., speech intelligibility in bits (SIIB) and extended short-time objective intelligibility (ESTOI), under a Cafeteria noise condition. In addition, formal listening tests reveal significant intelligibility gains when both noise and reverberation exist.

下载PDF全文

下载文献需遵守相关版权规定

论文标题