探索使用无监督的自动回归模型作为文本依赖扬声器验证的共享编码器

论文标题

探索使用无监督的自动回归模型作为文本依赖扬声器验证的共享编码器

Exploring the Use of an Unsupervised Autoregressive Model as a Shared Encoder for Text-Dependent Speaker Verification

论文作者

Ravi, Vijay, Fan, Ruchao, Afshan, Amber, Lu, Huanhua, Alwan, Abeer

论文摘要

在本文中，我们提出了一种新颖的方式，通过使用具有特定于任务的解码器的共享编码器来解决与文本有关的自动扬声器验证（TD-ASV）。使用室外（librispeech，voxceleb）和内域（Deep-Mine）未标记的数据集以学习通用的高级特征表示，以不受监督的方式预先训练，以无监督的方式预先训练，以学习通用的高级特征表示，以封装扬声器和语音内容。使用标记的数据集对扬声器（SID）和短语（PID）进行分类的数据集培训了两个特定于任务的解码器。使用PLDA对从SID解码器提取的扬声器嵌入式嵌入。 SID和PID系统在得分水平上融合。与跨语性Deepmine数据集的完全监督的X-Vector基线相比，我们的系统的MINDCF相对改善有51.9％。但是，I-vector/HMM方法的表现优于提出的APC编码器系统。在PID融合之前，X-vector/PLDA基线和SID/PLDA得分的融合进一步提高了性能，这表明提议的X-Vector System方法的互补性。我们表明，所提出的方法可以利用大型，未标记的数据丰富的域，并学习与下游任务无关的语音模式。这样的系统可以在域名不匹配的方案中提供竞争性能，其中测试数据来自数据筛选域。

In this paper, we propose a novel way of addressing text-dependent automatic speaker verification (TD-ASV) by using a shared-encoder with task-specific decoders. An autoregressive predictive coding (APC) encoder is pre-trained in an unsupervised manner using both out-of-domain (LibriSpeech, VoxCeleb) and in-domain (DeepMine) unlabeled datasets to learn generic, high-level feature representation that encapsulates speaker and phonetic content. Two task-specific decoders were trained using labeled datasets to classify speakers (SID) and phrases (PID). Speaker embeddings extracted from the SID decoder were scored using a PLDA. SID and PID systems were fused at the score level. There is a 51.9% relative improvement in minDCF for our system compared to the fully supervised x-vector baseline on the cross-lingual DeepMine dataset. However, the i-vector/HMM method outperformed the proposed APC encoder-decoder system. A fusion of the x-vector/PLDA baseline and the SID/PLDA scores prior to PID fusion further improved performance by 15% indicating complementarity of the proposed approach to the x-vector system. We show that the proposed approach can leverage from large, unlabeled, data-rich domains, and learn speech patterns independent of downstream tasks. Such a system can provide competitive performance in domain-mismatched scenarios where test data is from data-scarce domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题