使用深层嵌入功能同时进行脱钉和覆盖

论文标题

使用深层嵌入功能同时进行脱钉和覆盖

Simultaneous Denoising and Dereverberation Using Deep Embedding Features

论文作者

Fan, Cunhang, Tao, Jianhua, Liu, Bin, Yi, Jiangyan, Wen, Zhengqi

论文摘要

单声道语音静止覆盖是一项非常具有挑战性的任务，因为无法使用空间提示。当存在添加剂的声音时，此任务将变得更具挑战性。在本文中，我们提出了一种联合培训方法，用于使用深嵌入特征同时进行语音降解和覆盖，该特征基于深群（DC）。 DC是语音分离的最先进方法，包括嵌入学习和K-均值聚类。至于我们提出的方法，它包含两个阶段：去核和覆盖。在Denoising阶段，DC网络被利用以提取无噪声的深嵌入功能。这些嵌入功能是通过呼吸词和残余回响信号产生的。它们可以代表所需信号的推断光谱掩蔽模式，这些信号是判别特征。在编织阶段，使用另一个监督的神经网络，而不是使用无监督的K-均值聚类算法，可以从这些深层嵌入功能中估算出无声的语音。最后，通过联合训练方法优化了脱氧阶段和覆盖阶段。实验结果表明，该方法的表现优于WPE和BLSTM基准，尤其是在低SNR条件下。

Monaural speech dereverberation is a very challenging task because no spatial cues can be used. When the additive noises exist, this task becomes more challenging. In this paper, we propose a joint training method for simultaneous speech denoising and dereverberation using deep embedding features, which is based on the deep clustering (DC). DC is a state-of-the-art method for speech separation that includes embedding learning and K-means clustering. As for our proposed method, it contains two stages: denoising and dereverberation. At the denoising stage, the DC network is leveraged to extract noise-free deep embedding features. These embedding features are generated from the anechoic speech and residual reverberation signals. They can represent the inferred spectral masking patterns of the desired signals, which are discriminative features. At the dereverberation stage, instead of using the unsupervised K-means clustering algorithm, another supervised neural network is utilized to estimate the anechoic speech from these deep embedding features. Finally, the denoising stage and dereverberation stage are optimized by the joint training method. Experimental results show that the proposed method outperforms the WPE and BLSTM baselines, especially in the low SNR condition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题