论文标题

他们戴着口罩!通过X-向量和Fisher向量识别戴上手术面具的受试者

They are wearing a mask! Identification of Subjects Wearing a Surgical Mask from their Speech by means of x-vectors and Fisher Vectors

论文作者

Egas-López, José Vicente

论文摘要

由于其竞争性的学术和研究要求,与会者的言论会议中基于计算副语言学的挑战一直在与会者中受到良好接待。今年,Interspeech 2020计算副语言学挑战提供了三种不同的问题。在这里,面具子挑战具有特定的兴趣。这项挑战涉及戴上手术口罩时从受试者记录的语音分类。在这项研究中,为了解决上述问题,我们采用了两种不同类型的特征提取方法。 X向量嵌入,这是说话者识别的当前最新方法;还有Fisher Vector(FV),这是一种最初用于图像识别的方法,但在这里我们利用它来区分话语。这些方法采用不同的框架级表示:MFCC和PLP。使用支持向量机(SVM)作为分类器,我们在此特定分类任务的FV编码性能与X-Vector嵌入之间进行技术比较。我们发现,Fisher矢量编码比对于此特定数据集提供了比X矢量的更好表示。此外,我们表明,最佳配置的融合优于蒙版子挑战的所有基线得分。

Challenges based on Computational Paralinguistics in the INTERSPEECH Conference have always had a good reception among the attendees owing to its competitive academic and research demands. This year, the INTERSPEECH 2020 Computational Paralinguistics Challenge offers three different problems; here, the Mask Sub-Challenge is of specific interest. This challenge involves the classification of speech recorded from subjects while wearing a surgical mask. In this study, to address the above-mentioned problem we employ two different types of feature extraction methods. The x-vectors embeddings, which is the current state-of-the-art approach for Speaker Recognition; and the Fisher Vector (FV), that is a method originally intended for Image Recognition, but here we utilize it to discriminate utterances. These approaches employ distinct frame-level representations: MFCC and PLP. Using Support Vector Machines (SVM) as the classifier, we perform a technical comparison between the performances of the FV encodings and the x-vector embeddings for this particular classification task. We find that the Fisher vector encodings provide better representations of the utterances than the x-vectors do for this specific dataset. Moreover, we show that a fusion of our best configurations outperforms all the baseline scores of the Mask Sub-Challenge.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源