SAFCAR：组成作用识别的结构化注意融合

论文标题

SAFCAR：组成作用识别的结构化注意融合

SAFCAR: Structured Attention Fusion for Compositional Action Recognition

论文作者

Kim, Tae Soo, Hager, Gregory D.

论文摘要

我们提出了一个用于组成动作识别的一般框架 - 即行动识别，其中标签由较简单的组件（例如主体，原子侵蚀和对象）组成。组成动作识别的主要挑战是，可以使用基本组件组成的一组组合可能的动作集。但是，构图还提供了可以利用的结构。为此，我们开发和测试了一种新型的结构化注意融合（SAF）自我发注意机制，以结合对象检测的信息，这些信息捕获了动作的时间序列结构，以及捕获上下文信息的视觉提示。我们表明，我们的方法比当前的最新系统更有效地识别新型动词 - 单词组合物，并且它概括地只能从几个标记的示例中有效地看不见的动作类别。我们验证了我们从某种事物V2数据集中挑战性的事物任务的方法。我们进一步表明，我们的框架是灵活的，可以通过在Charades-Fewshot数据集上显示竞争成果来推广到新领域。

We present a general framework for compositional action recognition -- i.e. action recognition where the labels are composed out of simpler components such as subjects, atomic-actions and objects. The main challenge in compositional action recognition is that there is a combinatorially large set of possible actions that can be composed using basic components. However, compositionality also provides a structure that can be exploited. To do so, we develop and test a novel Structured Attention Fusion (SAF) self-attention mechanism to combine information from object detections, which capture the time-series structure of an action, with visual cues that capture contextual information. We show that our approach recognizes novel verb-noun compositions more effectively than current state of the art systems, and it generalizes to unseen action categories quite efficiently from only a few labeled examples. We validate our approach on the challenging Something-Else tasks from the Something-Something-V2 dataset. We further show that our framework is flexible and can generalize to a new domain by showing competitive results on the Charades-Fewshot dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题