视频动作分类的相互方式学习

论文标题

视频动作分类的相互方式学习

Mutual Modality Learning for Video Action Classification

论文作者

Komkov, Stepan, Dzabraev, Maksim, Petiushko, Aleksandr

论文摘要

视频动作分类模型的构建迅速发展。但是，通过与以不同方式训练的相同模型（例如，光流）训练相同的模型，仍然可以轻松提高这些模型的性能。不幸的是，在推断过程中使用几种方式在计算上昂贵。最近的著作研究了将多模式的优势整合到单个RGB模型中的方法。但是，仍然有改进的空间。在本文中，我们探讨了将集成功率嵌入单个模型的各种方法。我们表明，适当的初始化以及相互的模态学习可以增强单模式模型。结果，我们实现了最先进的结果V2基准。

The construction of models for video action classification progresses rapidly. However, the performance of those models can still be easily improved by ensembling with the same models trained on different modalities (e.g. Optical flow). Unfortunately, it is computationally expensive to use several modalities during inference. Recent works examine the ways to integrate advantages of multi-modality into a single RGB-model. Yet, there is still a room for improvement. In this paper, we explore the various methods to embed the ensemble power into a single model. We show that proper initialization, as well as mutual modality learning, enhances single-modality models. As a result, we achieve state-of-the-art results in the Something-Something-v2 benchmark.

下载PDF全文

下载文献需遵守相关版权规定

论文标题