语音增强的加权变异变量自动编码器模型

论文标题

语音增强的加权变异变量自动编码器模型

A weighted-variance variational autoencoder model for speech enhancement

论文作者

Golmakani, Ali, Sadeghi, Mostafa, Alameda-Pineda, Xavier, Serizel, Romain

论文摘要

我们基于变异自动编码器的语音增强，涉及在时间频率（TF）域中学习语音的先验分布。通常假定零均值的复合物值高斯分布在生成模型中，其中语音信息在差异中作为潜在变量的函数编码。与这种常用方法相反，我们提出了一个加权方差生成模型，其中每个频谱时间框架在参数学习中的贡献都得到加权。我们将伽玛先前的分布施加在权重，这将有效地导致学生的T分布，而不是用于语音生成建模的高斯。我们根据提出的生成模型开发有效的培训和语音增强算法。我们在频谱图自动编码和语音增强方面的实验结果证明了与标准未加权方差模型相比，提出方法的有效性和鲁棒性。

We address speech enhancement based on variational autoencoders, which involves learning a speech prior distribution in the time-frequency (TF) domain. A zero-mean complex-valued Gaussian distribution is usually assumed for the generative model, where the speech information is encoded in the variance as a function of a latent variable. In contrast to this commonly used approach, we propose a weighted variance generative model, where the contribution of each spectrogram time-frame in parameter learning is weighted. We impose a Gamma prior distribution on the weights, which would effectively lead to a Student's t-distribution instead of Gaussian for speech generative modeling. We develop efficient training and speech enhancement algorithms based on the proposed generative model. Our experimental results on spectrogram auto-encoding and speech enhancement demonstrate the effectiveness and robustness of the proposed approach compared to the standard unweighted variance model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题