端到端音频反击：向有效的音频分类网络增强增强

论文标题

端到端音频反击：向有效的音频分类网络增强增强

End-to-End Audio Strikes Back: Boosting Augmentations Towards An Efficient Audio Classification Network

论文作者

Gazneli, Avi, Zimerman, Gadi, Ridnik, Tal, Sharir, Gilad, Noy, Asaf

论文摘要

尽管已经提出了有效的体系结构和大量用于端到端图像分类任务的增强，并且已经进行了大量研究，但针对音频分类的最新技术仍然依赖于音频信号的众多表示以及大型体系结构，并通过大型数据集进行了微调。通过利用音频和新型音频增强的继承的轻质性质，我们能够提出一个具有强大概括能力的有效端到端网络。在各种声音分类集上进行的实验通过实现最新的各种环境来证明我们方法的有效性和鲁棒性。公共代码可在：\ href {https://github.com/alibaba-miil/audioclassfication} {此http url} {

While efficient architectures and a plethora of augmentations for end-to-end image classification tasks have been suggested and heavily investigated, state-of-the-art techniques for audio classifications still rely on numerous representations of the audio signal together with large architectures, fine-tuned from large datasets. By utilizing the inherited lightweight nature of audio and novel audio augmentations, we were able to present an efficient end-to-end network with strong generalization ability. Experiments on a variety of sound classification sets demonstrate the effectiveness and robustness of our approach, by achieving state-of-the-art results in various settings. Public code is available at: \href{https://github.com/Alibaba-MIIL/AudioClassfication}{this http url}

下载PDF全文

下载文献需遵守相关版权规定

论文标题