使用声学特征对语音进行分类

论文标题

使用声学特征对语音进行分类

Classification of Speech with and without Face Mask using Acoustic Features

论文作者

Das, Rohan Kumar, Li, Haizhou

论文摘要

言语的理解和解释可能会受到各种外部因素的影响。口罩的使用是一种可能在交流时造成语音阻碍的因素。这可能导致语音处理的退化并感知影响人类。知道说话者是否戴口罩可能有助于为不同应用建模语音。有了这一动机，发现扬声器是否戴上给定语音中的面罩作为计算副语言学评估（比较）2020的任务。我们研究了基于线性滤纸，瞬时阶段，瞬时相位和长期信息，可以捕获用于与无面膜的无效分类的文物的新型声学特征。这些声学特征与比较功能的最新基准一起使用，可与2020年进行比较。该研究揭示了声学特征的有效性，以及它们的得分水平融合与比较2020基线的2020年基准，导致未加入测试集的平均召回率73.50％。

The understanding and interpretation of speech can be affected by various external factors. The use of face masks is one such factors that can create obstruction to speech while communicating. This may lead to degradation of speech processing and affect humans perceptually. Knowing whether a speaker wears a mask may be useful for modeling speech for different applications. With this motivation, finding whether a speaker wears face mask from a given speech is included as a task in Computational Paralinguistics Evaluation (ComParE) 2020. We study novel acoustic features based on linear filterbanks, instantaneous phase and long-term information that can capture the artifacts for classification of speech with and without face mask. These acoustic features are used along with the state-of-the-art baselines of ComParE functionals, bag-of-audio-words, DeepSpectrum and auDeep features for ComParE 2020. The studies reveal the effectiveness of acoustic features, and their score level fusion with the ComParE 2020 baselines leads to an unweighted average recall of 73.50% on the test set.

下载PDF全文

下载文献需遵守相关版权规定

论文标题