使用图形卷积网络了解动态场景

论文标题

使用图形卷积网络了解动态场景

Understanding Dynamic Scenes using Graph Convolution Networks

论文作者

Mylavarapu, Sravan, Sandhu, Mahtab, Vijayan, Priyesh, Krishna, K Madhava, Ravindran, Balaraman, Namboodiri, Anoop

论文摘要

我们提出了一个新型的基于多元关系的图形卷积网络（MRGCN）的框架，以模拟一系列被移动的单眼摄像机抓住的时间订购的帧序列。 MRGCN的输入是一个多关系图，其中图的节点代表场景中的活动和被动剂/对象，并且连接每个节点的双向边缘是其时空关系的编码。我们表明，这种提出的明确编码和使用中间时空交互图表非常适合我们的任务，直接在一组时间上有序的空间关系上学习终端。我们还为MRGCN提出了一种注意力机制，该机制在场景上进行了动态评分来自不同相互作用类型的信息的重要性。提出的框架比在四个数据集上的车辆行为分类任务上的先前方法获得了显着的性能增长。我们还展示了学习到多个数据集的无缝传输，而无需求助于微调。这种行为预测方法在各种导航任务中找到了直接的相关性，例如行为规划，州估计以及与视频违规违规行为有关的应用程序。

We present a novel Multi-Relational Graph Convolutional Network (MRGCN) based framework to model on-road vehicle behaviors from a sequence of temporally ordered frames as grabbed by a moving monocular camera. The input to MRGCN is a multi-relational graph where the graph's nodes represent the active and passive agents/objects in the scene, and the bidirectional edges that connect every pair of nodes are encodings of their Spatio-temporal relations. We show that this proposed explicit encoding and usage of an intermediate spatio-temporal interaction graph to be well suited for our tasks over learning end-end directly on a set of temporally ordered spatial relations. We also propose an attention mechanism for MRGCNs that conditioned on the scene dynamically scores the importance of information from different interaction types. The proposed framework achieves significant performance gain over prior methods on vehicle-behavior classification tasks on four datasets. We also show a seamless transfer of learning to multiple datasets without resorting to fine-tuning. Such behavior prediction methods find immediate relevance in a variety of navigation tasks such as behavior planning, state estimation, and applications relating to the detection of traffic violations over videos.

下载PDF全文

下载文献需遵守相关版权规定

论文标题