强大的人声质量功能嵌入，用于发音障碍的语音检测

论文标题

强大的人声质量功能嵌入，用于发音障碍的语音检测

Robust Vocal Quality Feature Embeddings for Dysphonic Voice Detection

论文作者

Zhang, Jianwei, Liss, Julie, Jayasuriya, Suren, Berisha, Visar

论文摘要

世界人口的大约1.2％损害了语音生产。结果，自动拼音语音检测吸引了相当大的学术和临床兴趣。但是，现有的自动语音评估方法通常无法在培训条件或其他相关应用程序之外推广。在本文中，我们提出了一个深度学习框架，以生成对声音质量和在不同语料库中敏感的敏感的声学特征的嵌入。对比度损失与分类损失相结合，以共同训练我们的深度学习模型。输入语音样本上使用了数据翘曲方法，以提高我们方法的鲁棒性。经验结果表明，我们的方法不仅达到了高孔内和交叉孔分类的精度，而且还产生了对语音质量敏感和在不同语料库中敏感的良好嵌入。我们还将结果与三种基线方法进行了比较，以及三种变质的内部和跨核心数据集的变体，并证明所提出的模型始终胜过基线方法。

Approximately 1.2% of the world's population has impaired voice production. As a result, automatic dysphonic voice detection has attracted considerable academic and clinical interest. However, existing methods for automated voice assessment often fail to generalize outside the training conditions or to other related applications. In this paper, we propose a deep learning framework for generating acoustic feature embeddings sensitive to vocal quality and robust across different corpora. A contrastive loss is combined with a classification loss to train our deep learning model jointly. Data warping methods are used on input voice samples to improve the robustness of our method. Empirical results demonstrate that our method not only achieves high in-corpus and cross-corpus classification accuracy but also generates good embeddings sensitive to voice quality and robust across different corpora. We also compare our results against three baseline methods on clean and three variations of deteriorated in-corpus and cross-corpus datasets and demonstrate that the proposed model consistently outperforms the baseline methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题