音频自学学习：调查

论文标题

音频自学学习：调查

Audio Self-supervised Learning: A Survey

论文作者

Liu, Shuo, Mallol-Ragolta, Adria, Parada-Cabeleiro, Emilia, Qian, Kun, Jing, Xin, Kathan, Alexander, Hu, Bin, Schuller, Bjoern W.

论文摘要

受到人类概括知识和技能的认知能力的启发，自我监督的学习（SSL）目标是从大规模数据中发现一般表示的目标而无需人类注释，这是一项昂贵且耗时的任务。它在计算机视觉和自然语言处理领域的成功促使其最近采用了音频和语音处理领域。目前缺少总结音频SSL知识的综合评论。为了填补这一空白，在目前的工作中，我们提供了用于音频和语音处理应用程序的SSL方法的概述。在本文中，我们还总结了利用多模式SSL框架中音频模式的经验作品，以及现有的合适基准测试，以评估SSL在计算机试听域中的功能。最后，我们讨论了一些开放问题，并指出了关于音频SSL开发的未来指示。

Inspired by the humans' cognitive ability to generalise knowledge and skills, Self-Supervised Learning (SSL) targets at discovering general representations from large-scale data without requiring human annotations, which is an expensive and time consuming task. Its success in the fields of computer vision and natural language processing have prompted its recent adoption into the field of audio and speech processing. Comprehensive reviews summarising the knowledge in audio SSL are currently missing. To fill this gap, in the present work, we provide an overview of the SSL methods used for audio and speech processing applications. Herein, we also summarise the empirical works that exploit the audio modality in multi-modal SSL frameworks, and the existing suitable benchmarks to evaluate the power of SSL in the computer audition domain. Finally, we discuss some open problems and point out the future directions on the development of audio SSL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题