使用空间混合模型来满足分离的初始化方案

论文标题

使用空间混合模型来满足分离的初始化方案

An Initialization Scheme for Meeting Separation with Spatial Mixture Models

论文作者

Boeddeker, Christoph, Cord-Landwehr, Tobias, von Neumann, Thilo, Haeb-Umbach, Reinhold

论文摘要

空间混合物模型（SMM）支持的声学波束形成已被广泛用于同时活跃的扬声器的分离。但是，几乎没有考虑到会议数据的分离，这些数据的特征是长期记录，只有部分重叠的语音。在这项贡献中，我们表明，通常只有一个演讲者可以使用一个主动的人，可以用来巧妙的初始化SMM，该SMM采用了随时间变化的阶级先验。在图书馆的实验中，我们表明，所提出的初始化方案在下游语音识别任务上的单词错误率（WER）明显较低，而不是通过从Dirichlet分布中绘制班级概率的随机初始化。唯一必须知道说话者数量的要求，我们获得了5.9％的WER，这与此数据集的最佳报告相当。此外，基于空间信息，来自混合模型的估计说话者活性可作为诊断。

Spatial mixture model (SMM) supported acoustic beamforming has been extensively used for the separation of simultaneously active speakers. However, it has hardly been considered for the separation of meeting data, that are characterized by long recordings and only partially overlapping speech. In this contribution, we show that the fact that often only a single speaker is active can be utilized for a clever initialization of an SMM that employs time-varying class priors. In experiments on LibriCSS we show that the proposed initialization scheme achieves a significantly lower Word Error Rate (WER) on a downstream speech recognition task than a random initialization of the class probabilities by drawing from a Dirichlet distribution. With the only requirement that the number of speakers has to be known, we obtain a WER of 5.9 %, which is comparable to the best reported WER on this data set. Furthermore, the estimated speaker activity from the mixture model serves as a diarization based on spatial information.

下载PDF全文

下载文献需遵守相关版权规定

论文标题