使用强化学习改善音频事件探测器的后处理

论文标题

使用强化学习改善音频事件探测器的后处理

Improving Post-Processing of Audio Event Detectors Using Reinforcement Learning

论文作者

Giannakopoulos, Petros, Pikrakis, Aggelos, Cotronis, Yannis

论文摘要

我们将后处理应用于音频事件分类模型的类概率分布输出，并采用强化学习来共同发现后处理堆栈的各个阶段的最佳参数，例如分类阈值和用于平滑模型预测的中间过滤算法的核中间过滤算法的大小。为了实现这一目标，我们定义了一个强化学习环境：1）一个状态是该模型为给定音频样本提供的类概率分布，2）操作是选择候选的最佳值为后处理后堆栈的每个参数的最佳值，3）奖励是基于分类准确度指标。我们将我们的后处理应用于两个音频事件分类模型的类概率分布输出，这些模型已提交给Dcase Task4 2020挑战。我们发现，通过使用强化学习来发现应用于音频事件分类模型输出的后处理堆栈的最佳每一类参数，我们可以将基于音频事件的宏F1分数（在Dcase中使用的主要指标挑战中的主要指标）通过4-5％与使用后期的群体相比，将其比较4-5％。

We apply post-processing to the class probability distribution outputs of audio event classification models and employ reinforcement learning to jointly discover the optimal parameters for various stages of a post-processing stack, such as the classification thresholds and the kernel sizes of median filtering algorithms used to smooth out model predictions. To achieve this we define a reinforcement learning environment where: 1) a state is the class probability distribution provided by the model for a given audio sample, 2) an action is the choice of a candidate optimal value for each parameter of the post-processing stack, 3) the reward is based on the classification accuracy metric we aim to optimize, which is the audio event-based macro F1-score in our case. We apply our post-processing to the class probability distribution outputs of two audio event classification models submitted to the DCASE Task4 2020 challenge. We find that by using reinforcement learning to discover the optimal per-class parameters for the post-processing stack that is applied to the outputs of audio event classification models, we can improve the audio event-based macro F1-score (the main metric used in the DCASE challenge to compare audio event classification accuracy) by 4-5% compared to using the same post-processing stack with manually tuned parameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题