论文标题
短期扬声器验证挑战2021的SJTU系统
The SJTU System for Short-duration Speaker Verification Challenge 2021
论文作者
论文摘要
本文介绍了SJTU系统,用于短期扬声器验证(SDSV)挑战2021中的文本依赖性和与文本无关的任务。在此挑战中,我们探索了不同的强嵌入提取器以提取可靠的扬声器嵌入。对于与文本无关的任务,探索了与语言相关的自适应SNORM,以在跨语性验证条件下改善系统性能。对于与文本有关的任务,我们主要关注基于在大规模室外数据的模型的模型中的内域微观调整策略。为了提高说出相同短语的不同扬声器之间的区别,我们提出了几种新颖的短语 - 感知的微调策略和短语感知的神经PLDA。通过这样的策略,系统性能得到进一步提高。最后,我们融合了不同系统的得分,我们的融合系统在任务1(级别3)中达到了0.0473,而Task2(等级8)(等级8)在主要评估指标中达到了0.0581。
This paper presents the SJTU system for both text-dependent and text-independent tasks in short-duration speaker verification (SdSV) challenge 2021. In this challenge, we explored different strong embedding extractors to extract robust speaker embedding. For text-independent task, language-dependent adaptive snorm is explored to improve the system performance under the cross-lingual verification condition. For text-dependent task, we mainly focus on the in-domain fine-tuning strategies based on the model pre-trained on large-scale out-of-domain data. In order to improve the distinction between different speakers uttering the same phrase, we proposed several novel phrase-aware fine-tuning strategies and phrase-aware neural PLDA. With such strategies, the system performance is further improved. Finally, we fused the scores of different systems, and our fusion systems achieved 0.0473 in Task1 (rank 3) and 0.0581 in Task2 (rank 8) on the primary evaluation metric.