论文标题
RAREACT:一个不寻常互动的视频数据集
RareAct: A video dataset of unusual interactions
论文作者
论文摘要
本文介绍了一个手动注释的视频数据集,其中包括“混合电话”,“切割键盘”和“ Microwave Shoes”之类的动作。 RAREACT的目的是评估不可能对常见动作动词和对象名词组成的动作识别模型的零射击和几乎没有的组成。它包含122种不同的动作,这些动作是通过将动词和名词结合在HOWTO100M中的大规模文本语料库中很少共同出现的,但经常出现。我们使用最先进的HOWTO100M预处理的视频和文本模型提供基准测试,并表明动作的零射击和几乎没有射击的组成性仍然是一项具有挑战性且未解决的任务。
This paper introduces a manually annotated video dataset of unusual actions, namely RareAct, including actions such as "blend phone", "cut keyboard" and "microwave shoes". RareAct aims at evaluating the zero-shot and few-shot compositionality of action recognition models for unlikely compositions of common action verbs and object nouns. It contains 122 different actions which were obtained by combining verbs and nouns rarely co-occurring together in the large-scale textual corpus from HowTo100M, but that frequently appear separately. We provide benchmarks using a state-of-the-art HowTo100M pretrained video and text model and show that zero-shot and few-shot compositionality of actions remains a challenging and unsolved task.