论文标题
与事件相关的声学事件分类的数据调节
Event-related data conditioning for acoustic event classification
论文作者
论文摘要
基于各种注意力机制的模型最近在与声学事件分类(AEC)相关的任务中发挥了作用。其中,自我注意事件通常用于仅在音频任务中,以帮助模型识别不同的声学事件。自我发作取决于时间范围之间的相似性,并使用整个细分市场的全局信息来突出框架内的特定特征。在现实生活中,与声学事件有关的信息会随着时间的流逝而减弱,这意味着该事件周围某些框架内的信息比可能与事件无关的遥远时间全球信息值得更多。本文表明,自我注意力可能会超过某些音频表示形式,并平滑事件表示和背景噪声之间的界限。因此,本文提出了与事件相关的数据调理(EDC)。 EDC直接在频谱图上工作。 EDC的想法是根据声学特征自适应地选择与框架相关的注意范围,并收集与事件相关的本地信息以表示框架。实验表明:1)与基于频谱图的数据增强方法以及可训练的特征加权和自我注意力相比,EDC在原始大小模式和增强模式下都优于它们; 2)EDC有效地收集与事件相关的本地信息,并增强事件和背景之间的边界,从而提高AEC的性能。
Models based on diverse attention mechanisms have recently shined in tasks related to acoustic event classification (AEC). Among them, self-attention is often used in audio-only tasks to help the model recognize different acoustic events. Self-attention relies on the similarity between time frames, and uses global information from the whole segment to highlight specific features within a frame. In real life, information related to acoustic events will attenuate over time, which means the information within some frames around the event deserves more attention than distant time global information that may be unrelated to the event. This paper shows that self-attention may over-enhance certain segments of audio representations, and smooth out the boundaries between events representations and background noises. Hence, this paper proposes an event-related data conditioning (EDC) for AEC. EDC directly works on spectrograms. The idea of EDC is to adaptively select the frame-related attention range based on acoustic features, and gather the event-related local information to represent the frame. Experiments show that: 1) compared with spectrogram-based data augmentation methods and trainable feature weighting and self-attention, EDC outperforms them in both the original-size mode and the augmented mode; 2) EDC effectively gathers event-related local information and enhances boundaries between events and backgrounds, improving the performance of AEC.