环形概率球形判别分析

论文标题

环形概率球形判别分析

Toroidal Probabilistic Spherical Discriminant Analysis

论文作者

Silnova, Anna, Brümmer, Niko, Swart, Albert, Burget, Lukáš

论文摘要

在说话者的识别中，将语音段映射到嵌入到单位的超晶体上，通常使用两个得分后端，即余弦评分和PLDA。我们最近提出了对PLDA的类似物PSDA，它使用von Mises-fisher分布而不是高斯人。在本文中，我们介绍了环形PSDA（T-PSDA）。它扩展了PSDA，能够在Hypersphere的环形submanifolds中建模和言语之间的变化。与PLDA和PSDA一样，该模型允许封闭形式的评分和封闭形式的EM更新进行培训。在Voxceleb上，我们发现T-PSDA的精度与余弦得分相同，而PLDA的精度则不及。在NIST SRE'21上，我们发现T-PSDA与余弦评分和PLDA相比具有很大的准确性提高。

In speaker recognition, where speech segments are mapped to embeddings on the unit hypersphere, two scoring back-ends are commonly used, namely cosine scoring and PLDA. We have recently proposed PSDA, an analog to PLDA that uses Von Mises-Fisher distributions instead of Gaussians. In this paper, we present toroidal PSDA (T-PSDA). It extends PSDA with the ability to model within and between-speaker variabilities in toroidal submanifolds of the hypersphere. Like PLDA and PSDA, the model allows closed-form scoring and closed-form EM updates for training. On VoxCeleb, we find T-PSDA accuracy on par with cosine scoring, while PLDA accuracy is inferior. On NIST SRE'21 we find that T-PSDA gives large accuracy gains compared to both cosine scoring and PLDA.

下载PDF全文

下载文献需遵守相关版权规定

论文标题