张量表示行动识别

论文标题

张量表示行动识别

Tensor Representations for Action Recognition

论文作者

Koniusz, Piotr, Wang, Lei, Cherian, Anoop

论文摘要

视频序列中的人类动作的特征是空间特征及其时间动力学之间的复杂相互作用。在本文中，我们提出了新颖的张量表示，以紧凑地捕获动作识别任务的视觉特征之间的这种高阶关系。我们提出了两个基于张量的功能表示，即。（i）序列兼容性内核（SCK）和（ii）动力学兼容性内核（DCK）。 SCK建立在特征之间的时空相关性的基础上，而DCK明确对序列的动力动力学进行了建模。我们还探索了SCK的概括，即Coincincy SCK（+），该sck（+）在捕获相关性的局部全球相互作用上运作，这些相关性可以包含多模式输入，例如骨架3D身体连接界，每人群体分类器分数，从受过视频的深度学习模型中获得。我们引入了这些核的线性化，从而导致紧凑而快速的描述符。我们提供（i）3D骨架动作序列，（ii）细颗粒视频序列和（iii）标准的非精细粒度视频的实验。由于我们的最终表示形式是捕获特征的高阶关系的张量，因此它们与稳健细粒识别的共发生有关。我们使用高阶张量和所谓的特征值归一化（EPN），这些功率正常化（EPN）长期以来一直推测用于对高阶出现的光谱检测，从而检测特征的细粒关系，而不仅仅是在动作序列中计数特征。我们证明，由z*维特征构建的顺序R张量，再加上EPN的确检测到，如果将至少一个高阶出现“投影”到其DIM的一个（Z*，R）子空间之一中。 r由张量表示，从而形成张量归一化度量度量，该度量由Binom（z*，r）这样的“检测器”形成。

Human actions in video sequences are characterized by the complex interplay between spatial features and their temporal dynamics. In this paper, we propose novel tensor representations for compactly capturing such higher-order relationships between visual features for the task of action recognition. We propose two tensor-based feature representations, viz. (i) sequence compatibility kernel (SCK) and (ii) dynamics compatibility kernel (DCK). SCK builds on the spatio-temporal correlations between features, whereas DCK explicitly models the action dynamics of a sequence. We also explore generalization of SCK, coined SCK(+), that operates on subsequences to capture the local-global interplay of correlations, which can incorporate multi-modal inputs e.g., skeleton 3D body-joints and per-frame classifier scores obtained from deep learning models trained on videos. We introduce linearization of these kernels that lead to compact and fast descriptors. We provide experiments on (i) 3D skeleton action sequences, (ii) fine-grained video sequences, and (iii) standard non-fine-grained videos. As our final representations are tensors that capture higher-order relationships of features, they relate to co-occurrences for robust fine-grained recognition. We use higher-order tensors and so-called Eigenvalue Power Normalization (EPN) which have been long speculated to perform spectral detection of higher-order occurrences, thus detecting fine-grained relationships of features rather than merely count features in action sequences. We prove that a tensor of order r, built from Z* dimensional features, coupled with EPN indeed detects if at least one higher-order occurrence is `projected' into one of its binom(Z*,r) subspaces of dim. r represented by the tensor, thus forming a Tensor Power Normalization metric endowed with binom(Z*,r) such `detectors'.

下载PDF全文

下载文献需遵守相关版权规定

论文标题