使用自定义中侧信号和单声道处理的立体声演讲增强

论文标题

使用自定义中侧信号和单声道处理的立体声演讲增强

Stereo Speech Enhancement Using Custom Mid-Side Signals and Monaural Processing

论文作者

Master, Aaron, Lu, Lie, Swedlow, Nathan

论文摘要

语音增强（SE）系统通常在单膜输入上运行，用于应用程序，包括语音通信和用于用户生成的内容的清理。这些应用程序使用的设备的最新进展和变化可能会导致相同应用程序的两通道含量的数量增加。但是，SE系统通常是为单膜输入而设计的。使用琐事方法（例如独立或中侧处理）产生的立体声结果可能不令人满意，包括大量的语音扭曲。为了解决这个问题，我们提出了一个系统，该系统创建了一个名为“自定义中侧信号（CMS）”的立体信号的新颖表示。 CMS允许将中心语音的中间信号的好处扩展到更大的输入信号。反过来，这允许任何现有的单声道SE系统通过处理自定义MID信号作为有效的立体系统操作。我们描述了如何通过空间级过滤源分离系统的组件有效地估计CMS所需的参数。使用最先进的基于深度学习的SE系统在立体声内容上使用各种语音混合样式的主观聆听表明，CMSS的处理可改善语音质量，大约是独立于渠道的处理成本的一半。

Speech Enhancement (SE) systems typically operate on monaural input and are used for applications including voice communications and capture cleanup for user generated content. Recent advancements and changes in the devices used for these applications are likely to lead to an increase in the amount of two-channel content for the same applications. However, SE systems are typically designed for monaural input; stereo results produced using trivial methods such as channel independent or mid-side processing may be unsatisfactory, including substantial speech distortions. To address this, we propose a system which creates a novel representation of stereo signals called Custom Mid-Side Signals (CMSS). CMSS allow benefits of mid-side signals for center-panned speech to be extended to a much larger class of input signals. This in turn allows any existing monaural SE system to operate as an efficient stereo system by processing the custom mid signal. We describe how the parameters needed for CMSS can be efficiently estimated by a component of the spatio-level filtering source separation system. Subjective listening using state-of-the-art deep learning-based SE systems on stereo content with various speech mixing styles shows that CMSS processing leads to improved speech quality at approximately half the cost of channel-independent processing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题