论文标题

环形概率球形判别分析

Toroidal Probabilistic Spherical Discriminant Analysis

论文作者

Silnova, Anna, Brümmer, Niko, Swart, Albert, Burget, Lukáš

论文摘要

在说话者的识别中,将语音段映射到嵌入到单位的超晶体上,通常使用两个得分后端,即余弦评分和PLDA。我们最近提出了对PLDA的类似物PSDA,它使用von Mises-fisher分布而不是高斯人。在本文中,我们介绍了环形PSDA(T-PSDA)。它扩展了PSDA,能够在Hypersphere的环形submanifolds中建模和言语之间的变化。与PLDA和PSDA一样,该模型允许封闭形式的评分和封闭形式的EM更新进行培训。在Voxceleb上,我们发现T-PSDA的精度与余弦得分相同,而PLDA的精度则不及。在NIST SRE'21上,我们发现T-PSDA与余弦评分和PLDA相比具有很大的准确性提高。

In speaker recognition, where speech segments are mapped to embeddings on the unit hypersphere, two scoring back-ends are commonly used, namely cosine scoring and PLDA. We have recently proposed PSDA, an analog to PLDA that uses Von Mises-Fisher distributions instead of Gaussians. In this paper, we present toroidal PSDA (T-PSDA). It extends PSDA with the ability to model within and between-speaker variabilities in toroidal submanifolds of the hypersphere. Like PLDA and PSDA, the model allows closed-form scoring and closed-form EM updates for training. On VoxCeleb, we find T-PSDA accuracy on par with cosine scoring, while PLDA accuracy is inferior. On NIST SRE'21 we find that T-PSDA gives large accuracy gains compared to both cosine scoring and PLDA.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源