基于最佳传输的可解释律师扬声器改编

论文标题

基于最佳传输的可解释律师扬声器改编

Interpretable Dysarthric Speaker Adaptation based on Optimal-Transport

论文作者

Turrisi, Rosanna, Badino, Leonardo

论文摘要

这项工作解决了在违反语音识别的具有挑战性的背景下，培训数据（源）和测试数据（源）分布（源）之间的不匹配问题。我们专注于命令语音识别中的说话者适应（SA），其中可用多个来源的数据（即多个说话者）。具体而言，我们提出了一种基于最佳传输的无监督的多源域适应性（MSDA）算法，该算法通过加权关节最佳传输（MSDA-WJDOT）称为MSDA。我们分别在命令错误率的模型和最佳竞争对手方法上获得了指挥错误率相对降低16％和7％。拟议方法的优势在于，它与任何其他现有的SA方法不同，它提供了一种可解释的模型，在这种情况下，也可以利用该模型，以诊断构造质心，而无需任何特定的培训。确实，它提供了目标和源说话者之间的亲密度量，反映了他们在语音特征方面的相似性。基于目标扬声器和健康/违反源代码扬声器之间的相似性，我们然后定义了目标扬声器的健康/违反评分，我们利用了执行构音障碍检测的方法。这种方法不需要任何额外的培训，并且在构音症诊断中达到了95％的精度。

This work addresses the mismatch problem between the distribution of training data (source) and testing data (target), in the challenging context of dysarthric speech recognition. We focus on Speaker Adaptation (SA) in command speech recognition, where data from multiple sources (i.e., multiple speakers) are available. Specifically, we propose an unsupervised Multi-Source Domain Adaptation (MSDA) algorithm based on optimal-transport, called MSDA via Weighted Joint Optimal Transport (MSDA-WJDOT). We achieve a Command Error Rate relative reduction of 16% and 7% over the speaker-independent model and the best competitor method, respectively. The strength of the proposed approach is that, differently from any other existing SA method, it offers an interpretable model that can also be exploited, in this context, to diagnose dysarthria without any specific training. Indeed, it provides a closeness measure between the target and the source speakers, reflecting their similarity in terms of speech characteristics. Based on the similarity between the target speaker and the healthy/dysarthric source speakers, we then define the healthy/dysarthric score of the target speaker that we leverage to perform dysarthria detection. This approach does not require any additional training and achieves a 95% accuracy in the dysarthria diagnosis.

下载PDF全文

下载文献需遵守相关版权规定

论文标题