对年龄，情感和国家预测的自学和可学习的strf

论文标题

对年龄，情感和国家预测的自学和可学习的strf

Self-supervision and Learnable STRFs for Age, Emotion, and Country Prediction

论文作者

Sharma, Roshan, Vuong, Tyler, Lindsey, Mark, Dhamyal, Hira, Singh, Rita, Raj, Bhiksha

论文摘要

这项工作为2022年ICML表达性发声挑战exvo-multitask曲目的人声爆发音频介绍了对年龄，原产国和情感的同时估计的多任务方法。选择方法利用了光谱调制和自我监督的特征的组合，然后是在多任务范式中组织的编码器编码网络。我们通过检查独立的任务特定模型和关节模型来评估所构成的任务之间的互补性，并探索不同特征集的相对优势。我们还引入了一种简单的分数融合机制，以利用此任务的不同特征集的互补性。我们发现，与Spectro-Spormo-Spormor-Sportro融合的结合进行了强大的数据预处理，Hubert模型达到了我们最佳的EXVO-Multitask测试评分为0.412。

This work presents a multitask approach to the simultaneous estimation of age, country of origin, and emotion given vocal burst audio for the 2022 ICML Expressive Vocalizations Challenge ExVo-MultiTask track. The method of choice utilized a combination of spectro-temporal modulation and self-supervised features, followed by an encoder-decoder network organized in a multitask paradigm. We evaluate the complementarity between the tasks posed by examining independent task-specific and joint models, and explore the relative strengths of different feature sets. We also introduce a simple score fusion mechanism to leverage the complementarity of different feature sets for this task. We find that robust data preprocessing in conjunction with score fusion over spectro-temporal receptive field and HuBERT models achieved our best ExVo-MultiTask test score of 0.412.

下载PDF全文

下载文献需遵守相关版权规定

论文标题