论文标题
调查语音增强的跨域损失
Investigating Cross-Domain Losses for Speech Enhancement
论文作者
论文摘要
近年来,可用语音增强(SE)和认可的可用框架数量激增。无论是基于模型还是通过深度学习构建,这些框架通常都依赖于语音数据的时间域信号或时间频率(TF)表示。在这项研究中,我们通过单独研究它们对语音清晰度和质量的影响来研究每种方法的优势。此外,我们通过引入两个新的跨域SE框架来结合时间域和TF语音表示的分散好处。对最新基于模型和深度学习的SE方法进行了定量比较分析,以说明所提出的框架的优点。
Recent years have seen a surge in the number of available frameworks for speech enhancement (SE) and recognition. Whether model-based or constructed via deep learning, these frameworks often rely in isolation on either time-domain signals or time-frequency (TF) representations of speech data. In this study, we investigate the advantages of each set of approaches by separately examining their impact on speech intelligibility and quality. Furthermore, we combine the fragmented benefits of time-domain and TF speech representations by introducing two new cross-domain SE frameworks. A quantitative comparative analysis against recent model-based and deep learning SE approaches is performed to illustrate the merit of the proposed frameworks.