幅度感知的概率扬声器嵌入

论文标题

幅度感知的概率扬声器嵌入

Magnitude-aware Probabilistic Speaker Embeddings

论文作者

Kuzmin, Nikita, Fedorov, Igor, Sholokhov, Alexey

论文摘要

最近，超级嵌入式嵌入已成为面部和语音识别的主要技术。具体而言，学会了欧几里得太空矢量嵌入，以在忽略幅度的同时朝他们的方向编码特定于人的信息。但是，最近的研究表明，深神经网络提取的嵌入的幅度可能表明相应输入的质量。本文探讨了与质量评估和分布外检测有关的嵌入量的特性。我们建议使用嵌入幅度中编码的信息并在说话者验证管道中利用它来提出一种新的概率扬声器嵌入提取器。我们还提出了几种质量意识的诊断方法，并将其纳入其中。我们的结果表明，在说话者验证和诊断任务中，对幅度不足的基线的显着改善。

Recently, hyperspherical embeddings have established themselves as a dominant technique for face and voice recognition. Specifically, Euclidean space vector embeddings are learned to encode person-specific information in their direction while ignoring the magnitude. However, recent studies have shown that the magnitudes of the embeddings extracted by deep neural networks may indicate the quality of the corresponding inputs. This paper explores the properties of the magnitudes of the embeddings related to quality assessment and out-of-distribution detection. We propose a new probabilistic speaker embedding extractor using the information encoded in the embedding magnitude and leverage it in the speaker verification pipeline. We also propose several quality-aware diarization methods and incorporate the magnitudes in those. Our results indicate significant improvements over magnitude-agnostic baselines both in speaker verification and diarization tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题