论文标题
IDLAB VOXSRC-20提交:在基于DNN的扬声器验证中,大幅度的微调和质量意识分数校准
The IDLAB VoxSRC-20 Submission: Large Margin Fine-Tuning and Quality-Aware Score Calibration in DNN Based Speaker Verification
论文作者
论文摘要
在本文中,我们提出和分析了一个较大的微微调策略,并在与文本无关的说话者验证中进行了质量意识的分数校准。较大的边距微调是基于DNN的扬声器验证系统的二级训练阶段,该系统接受了基于保证金的损失功能。它使网络能够通过促进更长的训练说法和更具侵略性的罚款来创建更健壮的扬声器嵌入。得分校准是扬声器验证系统的常见实践,可以将输出分数映射到精心校准的对数可能性比例,可以将其转换为可解释的概率。通过在校准系统中包含质量特征,评估指标的决策阈值在不同的试验条件下变得质量依赖且更加一致。在ECAPA-TDNN体系结构上应用两种增强功能都会为所有公开可用的Voxceleb1测试集带来最先进的结果,并在2020年Voxceleb扬声器识别挑战赛的监督验证轨道上为我们的获胜介绍做出了贡献。
In this paper we propose and analyse a large margin fine-tuning strategy and a quality-aware score calibration in text-independent speaker verification. Large margin fine-tuning is a secondary training stage for DNN based speaker verification systems trained with margin-based loss functions. It enables the network to create more robust speaker embeddings by enabling the use of longer training utterances in combination with a more aggressive margin penalty. Score calibration is a common practice in speaker verification systems to map output scores to well-calibrated log-likelihood-ratios, which can be converted to interpretable probabilities. By including quality features in the calibration system, the decision thresholds of the evaluation metrics become quality-dependent and more consistent across varying trial conditions. Applying both enhancements on the ECAPA-TDNN architecture leads to state-of-the-art results on all publicly available VoxCeleb1 test sets and contributed to our winning submissions in the supervised verification tracks of the VoxCeleb Speaker Recognition Challenge 2020.