MDNET：从深度先前的梯度中学习单声道语音增强

论文标题

MDNET：从深度先前的梯度中学习单声道语音增强

MDNet: Learning Monaural Speech Enhancement from Deep Prior Gradient

论文作者

Li, Andong, Zheng, Chengshi, Zhang, Ziyang, Li, Xiaodong

论文摘要

虽然传统的统计信号处理模型方法可以得出依靠特定统计假设的最佳估计器，但当前基于学习的方法进一步通过深层神经网络促进了性能上限，但牺牲了高封装和缺乏足够的解释性。站在传统的基于模型的方法与基于学习的方法之间的交集上，我们提出了一种基于最大A后验（MAP）框架（称为MDNET）的模型驱动方法，以增强单渠道语音。具体而言，原始问题被提出为联合后估计W.R.T.语音和噪声组件。与先前的手册假设不同，我们建议通过网络对先前的分布进行建模，从而可以从培训数据中学习。该框架采用展开结构，在每个步骤中，可以通过显式梯度下降操作逐步估算目标参数。此外，另一个网络是进一步完善先前语音估计的融合模块。实验是在WSJ0-SI84和Interspeech2020 DNS-Challenge数据集上进行的，定量结果表明，所提出的方法超过了先前的最先进的基线。

While traditional statistical signal processing model-based methods can derive the optimal estimators relying on specific statistical assumptions, current learning-based methods further promote the performance upper bound via deep neural networks but at the expense of high encapsulation and lack adequate interpretability. Standing upon the intersection between traditional model-based methods and learning-based methods, we propose a model-driven approach based on the maximum a posteriori (MAP) framework, termed as MDNet, for single-channel speech enhancement. Specifically, the original problem is formulated into the joint posterior estimation w.r.t. speech and noise components. Different from the manual assumption toward the prior terms, we propose to model the prior distribution via networks and thus can learn from training data. The framework takes the unfolding structure and in each step, the target parameters can be progressively estimated through explicit gradient descent operations. Besides, another network serves as the fusion module to further refine the previous speech estimation. The experiments are conducted on the WSJ0-SI84 and Interspeech2020 DNS-Challenge datasets, and quantitative results show that the proposed approach outshines previous state-of-the-art baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题