多通道语音分离与窄带构象异构体

论文标题

多通道语音分离与窄带构象异构体

Multichannel Speech Separation with Narrow-band Conformer

论文作者

Quan, Changsheng, Li, Xiaofei

论文摘要

这项工作提出了一种具有窄带构象异构体（名为NBC）的多通道语音分离方法。该网络经过培训，可以自动利用窄带语音分离信息，例如多个扬声器的空间矢量聚类。具体而言，在短时傅立叶变换（STFT）域中，网络独立处理每个频率，并由所有频率共享。对于一个频率，该网络输入多通道混合物信号的STFT系数，并预测分离的语音信号的STFT系数。空间矢量的聚类与自我发注意机制具有相似的原理，这是在计算向量相似性并汇总相似向量的意义上。因此，构象异构体特别适合当前问题。实验表明，所提出的窄带构象体比其他最先进的方法可以通过大幅度的边缘获得更好的语音分离性能。

This work proposes a multichannel speech separation method with narrow-band Conformer (named NBC). The network is trained to learn to automatically exploit narrow-band speech separation information, such as spatial vector clustering of multiple speakers. Specifically, in the short-time Fourier transform (STFT) domain, the network processes each frequency independently, and is shared by all frequencies. For one frequency, the network inputs the STFT coefficients of multichannel mixture signals, and predicts the STFT coefficients of separated speech signals. Clustering of spatial vectors shares a similar principle with the self-attention mechanism in the sense of computing the similarity of vectors and then aggregating similar vectors. Therefore, Conformer would be especially suitable for the present problem. Experiments show that the proposed narrow-band Conformer achieves better speech separation performance than other state-of-the-art methods by a large margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题