论文标题
相连续性:相光谱的学习导数,以增强语音
Phase Continuity: Learning Derivatives of Phase Spectrum for Speech Enhancement
论文作者
论文摘要
现代神经言语增强模型通常在其培训损失方面包括各种形式的相位信息,无论是明确或隐式的。但是,这些损失项通常旨在减少特定频率下相光谱值的失真,这确保它们不会显着影响增强语音的质量。在本文中,我们提出了一种可以在嘈杂环境中运行的神经语音增强的有效阶段重建策略。具体而言,我们引入了一个相位连续性损失,该损失考虑了整个时间和频率轴的相对相位变化。通过将这种阶段的连续性丧失包括在最先进的神经语音增强系统中,该系统通过重建损失和许多幅度的频谱损失训练,我们表明我们提出的方法进一步提高了基线上增强的语音信号的质量,尤其是当训练与量级频谱损失共同完成时。
Modern neural speech enhancement models usually include various forms of phase information in their training loss terms, either explicitly or implicitly. However, these loss terms are typically designed to reduce the distortion of phase spectrum values at specific frequencies, which ensures they do not significantly affect the quality of the enhanced speech. In this paper, we propose an effective phase reconstruction strategy for neural speech enhancement that can operate in noisy environments. Specifically, we introduce a phase continuity loss that considers relative phase variations across the time and frequency axes. By including this phase continuity loss in a state-of-the-art neural speech enhancement system trained with reconstruction loss and a number of magnitude spectral losses, we show that our proposed method further improves the quality of enhanced speech signals over the baseline, especially when training is done jointly with a magnitude spectrum loss.