DCASE 2019中声音事件本地化和检测的概述和评估

论文标题

DCASE 2019中声音事件本地化和检测的概述和评估

Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019

论文作者

Politis, Archontis, Mesaros, Annamaria, Adavanne, Sharath, Heittola, Toni, Virtanen, Tuomas

论文摘要

声音事件的本地化和检测是一个新的研究领域，它是从分析声音的综合兴趣中，从感兴趣的声音的空间和时间活动来分析声学场景。本文概述了第一个关于声音事件本地化和检测的国际评估，该评估是DCASE 2019挑战的任务。为挑战而生成了一个大规模的空间化声音事件数据集，用于培训基于学习的方法以及评估未标记子集中的提交。概述详细介绍了如何评估和排名系统以及表现最佳系统的特征。讨论了关于输入特征，模型架构，培训方法，先验知识的开发和数据扩展的共同策略。由于在挑战中排名是基于单独评估本地化和事件分类绩效的基础，因此概述的一部分着重于呈现两者的联合测量指标，以及使用这些新指标对提交的重新评估。新的分析揭示了在挑战中排名较高的某些提交的提交，在检测到原始位置接近其原始位置的正确类型的事件类型的联合任务上表现得更好。因此，在检测或本地化时单独评估时进行强烈评估但并非共同评估的提交的排名受到了负面影响。

Sound event localization and detection is a novel area of research that emerged from the combined interest of analyzing the acoustic scene in terms of the spatial and temporal activity of sounds of interest. This paper presents an overview of the first international evaluation on sound event localization and detection, organized as a task of the DCASE 2019 Challenge. A large-scale realistic dataset of spatialized sound events was generated for the challenge, to be used for training of learning-based approaches, and for evaluation of the submissions in an unlabeled subset. The overview presents in detail how the systems were evaluated and ranked and the characteristics of the best-performing systems. Common strategies in terms of input features, model architectures, training approaches, exploitation of prior knowledge, and data augmentation are discussed. Since ranking in the challenge was based on individually evaluating localization and event classification performance, part of the overview focuses on presenting metrics for the joint measurement of the two, together with a reevaluation of submissions using these new metrics. The new analysis reveals submissions that performed better on the joint task of detecting the correct type of event close to its original location than some of the submissions that were ranked higher in the challenge. Consequently, ranking of submissions which performed strongly when evaluated separately on detection or localization, but not jointly on both, was affected negatively.

下载PDF全文

下载文献需遵守相关版权规定

论文标题