强大的，一般和低复杂性声学场景分类系统以及用于呈现声音场景上下文的有效可视化

论文标题

强大的，一般和低复杂性声学场景分类系统以及用于呈现声音场景上下文的有效可视化

Robust, General, and Low Complexity Acoustic Scene Classification Systems and An Effective Visualization for Presenting a Sound Scene Context

论文作者

Pham, Lam, Salovic, Dusan, Jalali, Anahid, Schindler, Alexander, Tran, Khoa, Vu, Canh, Nguyen, Phu X.

论文摘要

在本文中，我们介绍了声学场景分类（ASC）的全面分析，这是从其声学签名中识别音频录制场景的任务。特别是，我们首先提出了一个基于Inception的和低的足迹ASC模型，称为ASC基线。然后将提出的ASC基线与MobilenetV1，MobilenetV2，VGG16，VGG19，Resnet50V2，Resnet152V2，Densenet121，Densenet121，Densenet201，Densenet201和Xpection的基准和高复杂网络结构进行比较。接下来，我们通过提出一种新型的深神经网络体系结构来改善ASC基线，该建筑利用残留的构建体和多个内核。鉴于新颖的剩余成立（NRI）模型，我们进一步评估了模型复杂性和模型精度性能之间的权衡。最后，我们评估声音场景记录中发生的声音事件是否可以帮助提高ASC的准确性，然后通过将声音场景和声音事件信息相结合来表明如何很好地呈现声音场景上下文。我们在各种ASC数据集上进行了广泛的实验，包括拥挤的场景，IEEE AASP挑战，挑战和分类声学场景和事件（DCASE）2018 2018年任务1A和1B，2019年和1B，2019年任务1A和1B，2020任务1A，2021任务1A，任务1A，2022任务1。首先是提出适用于广泛的边缘设备和手机上的现实生活应用的健壮，一般和低复杂性ASC系统；第二个是提出一种有效的可视化方法，用于全面展示声音场景上下文。

In this paper, we present a comprehensive analysis of Acoustic Scene Classification (ASC), the task of identifying the scene of an audio recording from its acoustic signature. In particular, we firstly propose an inception-based and low footprint ASC model, referred to as the ASC baseline. The proposed ASC baseline is then compared with benchmark and high-complexity network architectures of MobileNetV1, MobileNetV2, VGG16, VGG19, ResNet50V2, ResNet152V2, DenseNet121, DenseNet201, and Xception. Next, we improve the ASC baseline by proposing a novel deep neural network architecture which leverages residual-inception architectures and multiple kernels. Given the novel residual-inception (NRI) model, we further evaluate the trade off between the model complexity and the model accuracy performance. Finally, we evaluate whether sound events occurring in a sound scene recording can help to improve ASC accuracy, then indicate how a sound scene context is well presented by combining both sound scene and sound event information. We conduct extensive experiments on various ASC datasets, including Crowded Scenes, IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 Task 1A and 1B, 2019 Task 1A and 1B, 2020 Task 1A, 2021 Task 1A, 2022 Task 1. The experimental results on several different ASC challenges highlight two main achievements; the first is to propose robust, general, and low complexity ASC systems which are suitable for real-life applications on a wide range of edge devices and mobiles; the second is to propose an effective visualization method for comprehensively presenting a sound scene context.

下载PDF全文

下载文献需遵守相关版权规定

论文标题