论文标题

部分可观测时空混沌系统的无模型预测

Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement

论文作者

Sawata, Ryosuke, Murata, Naoki, Takida, Yuhta, Uesaka, Toshimitsu, Shibuya, Takashi, Takahashi, Shusuke, Mitsufuji, Yuki

论文摘要

尽管基于深度神经网络(DNN)的语音增强(SE)方法的表现优于先前的非基于非DNN的方法,但它们通常会降低产生的输出的感知质量。为了解决这个问题,我们引入了基于DNN的生成炼油厂Diffiner,旨在改善通过SE方法预先处理的感知语音质量。我们通过使用仅由干净的语音组成的数据集来训练基于扩散的生成模型。然后,我们的炼油厂有效地混合了新生成的清洁零件,这是通过将扩散恢复到由先前的SE方法引起的降解和扭曲的部分,从而导致精致语音的降解和扭曲的部分。一旦我们的炼油厂接受了一组干净的语音培训,就可以将其应用于各种SE方法,而无需为每个SE模块进行其他专门培训。因此,我们的炼油厂可以是一种多功能的后处理模块W.R.T. SE方法,在模块化方面具有很高的潜力。实验结果表明,无论使用前面的SE方法如何,我们的方法都改善了感知语音质量。

Although deep neural network (DNN)-based speech enhancement (SE) methods outperform the previous non-DNN-based ones, they often degrade the perceptual quality of generated outputs. To tackle this problem, we introduce a DNN-based generative refiner, Diffiner, aiming to improve perceptual speech quality pre-processed by an SE method. We train a diffusion-based generative model by utilizing a dataset consisting of clean speech only. Then, our refiner effectively mixes clean parts newly generated via denoising diffusion restoration into the degraded and distorted parts caused by a preceding SE method, resulting in refined speech. Once our refiner is trained on a set of clean speech, it can be applied to various SE methods without additional training specialized for each SE module. Therefore, our refiner can be a versatile post-processing module w.r.t. SE methods and has high potential in terms of modularity. Experimental results show that our method improved perceptual speech quality regardless of the preceding SE methods used.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源