论文标题
使用特权信息进行零拍动识别的学习
Learning Using Privileged Information for Zero-Shot Action Recognition
论文作者
论文摘要
零射击行动识别(ZSAR)旨在识别培训期间从未见过的视频动作。大多数现有方法都假设可见的和看不见的动作之间存在共享的语义空间,并打算直接学习从视觉空间到语义空间的映射。视觉空间和语义空间之间的语义差距挑战了这种方法。本文提出了一种新颖的方法,该方法使用对象语义作为特权信息来缩小语义差距,从而有效地帮助学习。特别是,提出了一个简单的幻觉网络,以在不明确提取对象的情况下隐式提取对象语义,并开发了一个跨意义模块,以增强对象语义的视觉功能。 Olympic Sports,HMDB51和UCF101数据集的实验表明,所提出的方法的表现优于最先进的方法。
Zero-Shot Action Recognition (ZSAR) aims to recognize video actions that have never been seen during training. Most existing methods assume a shared semantic space between seen and unseen actions and intend to directly learn a mapping from a visual space to the semantic space. This approach has been challenged by the semantic gap between the visual space and semantic space. This paper presents a novel method that uses object semantics as privileged information to narrow the semantic gap and, hence, effectively, assist the learning. In particular, a simple hallucination network is proposed to implicitly extract object semantics during testing without explicitly extracting objects and a cross-attention module is developed to augment visual feature with the object semantics. Experiments on the Olympic Sports, HMDB51 and UCF101 datasets have shown that the proposed method outperforms the state-of-the-art methods by a large margin.