RADNET：一种深层神经网络模型，用于运动自主系统的强大感知

论文标题

RADNET：一种深层神经网络模型，用于运动自主系统的强大感知

RADNet: A Deep Neural Network Model for Robust Perception in Moving Autonomous Systems

论文作者

Mudassar, Burhan A., Ko, Sho, Li, Maojingjing, Saha, Priyabrata, Mukhopadhyay, Saibal

论文摘要

交互式自主应用需要在不受约束的视频中对感知引擎进行鲁棒性。在本文中，我们研究了相机运动对动作检测任务的影响。我们开发了一种新颖的排名方法来根据全球摄像机运动的程度对视频进行排名。对于高级摄像头视频，我们显示动作检测的准确性降低了。我们提出了一条动作检测管道，该管道可对摄像机运动效应进行鲁棒性并进行经验验证。具体来说，我们将演员跨框架和几对全球场景功能具有特定于本地演员特定功能的作品对齐。我们确实使用时空采样网络（STSN）的新公式进行了对齐，但是使用金字塔结构进行多尺度偏移预测和完善。我们还提出了一种新颖的输入，以融合本地和全球特征的依赖的加权平均策略。我们显示了网络在我们的数据集中的适用性，该数据集具有高相机运动（移动数据集）的移动相机视频，帧映射增加了4.1％，视频映射增加了17％。

Interactive autonomous applications require robustness of the perception engine to artifacts in unconstrained videos. In this paper, we examine the effect of camera motion on the task of action detection. We develop a novel ranking method to rank videos based on the degree of global camera motion. For the high ranking camera videos we show that the accuracy of action detection is decreased. We propose an action detection pipeline that is robust to the camera motion effect and verify it empirically. Specifically, we do actor feature alignment across frames and couple global scene features with local actor-specific features. We do feature alignment using a novel formulation of the Spatio-temporal Sampling Network (STSN) but with multi-scale offset prediction and refinement using a pyramid structure. We also propose a novel input dependent weighted averaging strategy for fusing local and global features. We show the applicability of our network on our dataset of moving camera videos with high camera motion (MOVE dataset) with a 4.1% increase in frame mAP and 17% increase in video mAP.

下载PDF全文

下载文献需遵守相关版权规定

论文标题