使用U-NET的单阶段语音降解和缩放

论文标题

使用U-NET的单阶段语音降解和缩放

Phase-aware Single-stage Speech Denoising and Dereverberation with U-Net

论文作者

Choi, Hyeong-Seok, Heo, Hoon, Lee, Jie Hwan, Lee, Kyogu

论文摘要

在这项工作中，我们通过单阶段框架解决了一个分解和替代问题。尽管可以将降解和静脉覆盖视为两个独立的挑战任务，因此，每个任务通常需要两个模块，但我们表明可以共享一个深层网络来解决这两个问题。为此，我们提出了一种新的掩蔽方法，称为相吸引的β-肌液掩模（PHM），该方法通过尊重三个信号组件（例如混合物，源和其余）之间的复杂域中的三角形不平等来重复估计的幅度值以估计清洁相。两个PHM用于处理直接和回响的来源，这允许在推理时在增强语音中控制混响的比例。此外，为了提高语音增强性能，我们提出了一种新的时域损失函数，并与复杂域中的MSE损失相比显示出合理的性能增长。最后，为了实现实时推断，提出了对U-NET的优化策略，该策略可显着将计算开销降低到88.9％，而不是幼稚版本。

In this work, we tackle a denoising and dereverberation problem with a single-stage framework. Although denoising and dereverberation may be considered two separate challenging tasks, and thus, two modules are typically required for each task, we show that a single deep network can be shared to solve the two problems. To this end, we propose a new masking method called phase-aware beta-sigmoid mask (PHM), which reuses the estimated magnitude values to estimate the clean phase by respecting the triangle inequality in the complex domain between three signal components such as mixture, source and the rest. Two PHMs are used to deal with direct and reverberant source, which allows controlling the proportion of reverberation in the enhanced speech at inference time. In addition, to improve the speech enhancement performance, we propose a new time-domain loss function and show a reasonable performance gain compared to MSE loss in the complex domain. Finally, to achieve a real-time inference, an optimization strategy for U-Net is proposed which significantly reduces the computational overhead up to 88.9% compared to the naïve version.

下载PDF全文

下载文献需遵守相关版权规定

论文标题