关于基于深度学习的语音增强的交叉概括

论文标题

关于基于深度学习的语音增强的交叉概括

On Cross-Corpus Generalization of Deep Learning Based Speech Enhancement

论文作者

Pandey, Ashutosh, Wang, DeLiang

论文摘要

近年来，使用深神经网络（DNN）的监督方法已成为言语增强的主流。已经确定，如果使用大量的噪音和发言人进行了训练，则DNN可以很好地概括为未经训练的噪音和发言人。但是，我们发现DNN在低信噪比（SNR）条件下未能推广到新的语音语料库。在这项工作中，我们确定缺乏概括主要是由于渠道不匹配，即受过训练的和未经训练的语料库之间的不同记录条件。此外，我们观察到传统的通道归一化技术在改善交叉概括方面无效。此外，我们评估了有希望可以泛化的公开可用数据集。我们发现一种特定的语料库比其他语料库要好得多。最后，我们发现在短期处理语音处理中使用较小的框架变化可以显着改善交叉概括。解决交叉概括的提议的技术包括通道归一化，更好的训练语料库以及短期傅立叶变换（STFT）的较小框架变化。这些技术共同提高了未经培训语料库的客观可理解性和质量评分。

In recent years, supervised approaches using deep neural networks (DNNs) have become the mainstream for speech enhancement. It has been established that DNNs generalize well to untrained noises and speakers if trained using a large number of noises and speakers. However, we find that DNNs fail to generalize to new speech corpora in low signal-to-noise ratio (SNR) conditions. In this work, we establish that the lack of generalization is mainly due to the channel mismatch, i.e. different recording conditions between the trained and untrained corpus. Additionally, we observe that traditional channel normalization techniques are not effective in improving cross-corpus generalization. Further, we evaluate publicly available datasets that are promising for generalization. We find one particular corpus to be significantly better than others. Finally, we find that using a smaller frame shift in short-time processing of speech can significantly improve cross-corpus generalization. The proposed techniques to address cross-corpus generalization include channel normalization, better training corpus, and smaller frame shift in short-time Fourier transform (STFT). These techniques together improve the objective intelligibility and quality scores on untrained corpora significantly.

下载PDF全文

下载文献需遵守相关版权规定

论文标题