使用一致性正规化时空数据加强以进行行动识别

论文标题

使用一致性正规化时空数据加强以进行行动识别

Self-supervised learning using consistency regularization of spatio-temporal data augmentation for action recognition

论文作者

Wang, Jinpeng, Lin, Yiqi, Ma, Andy J.

论文摘要

自我监督的学习通过直接从未标记的数据中构建替代监督信号来以无监督的方式改善深度学习模型，这表明了巨大的潜力。与现有作品不同，我们提出了一种新的方法，可以根据一致性正则化的高级特征图获得替代监督信号。在本文中，我们提出了从暹罗网络产生的不同输出功能之间的时空一致性正则化，包括一条带有原始视频的干净路径和带有相应增强视频的噪声路径。基于视频的时空特征，我们开发了两种基于视频的数据增强方法，即时空转换和视频内混合。提出了前者的一致性来建模特征的转换一致性，而后者旨在保持空间不变性以提取与动作相关的特征。广泛的实验表明，与最先进的自我监督学习方法相比，我们的方法取得了重大改进。当使用我们的方法作为额外的正规化术语并与当前的替代监督信号结合使用时，我们在HMDB51上的先前最先前的ART上实现了22％的相对改善，在UCF101上获得了7％。

Self-supervised learning has shown great potentials in improving the deep learning model in an unsupervised manner by constructing surrogate supervision signals directly from the unlabeled data. Different from existing works, we present a novel way to obtain the surrogate supervision signal based on high-level feature maps under consistency regularization. In this paper, we propose a Spatio-Temporal Consistency Regularization between different output features generated from a siamese network including a clean path fed with original video and a noise path fed with the corresponding augmented video. Based on the Spatio-Temporal characteristics of video, we develop two video-based data augmentation methods, i.e., Spatio-Temporal Transformation and Intra-Video Mixup. Consistency of the former one is proposed to model transformation consistency of features, while the latter one aims at retaining spatial invariance to extract action-related features. Extensive experiments demonstrate that our method achieves substantial improvements compared with state-of-the-art self-supervised learning methods for action recognition. When using our method as an additional regularization term and combine with current surrogate supervision signals, we achieve 22% relative improvement over the previous state-of-the-art on HMDB51 and 7% on UCF101.

下载PDF全文

下载文献需遵守相关版权规定

论文标题