基于骨架的动作识别的姿势改进图卷积网络

论文标题

基于骨架的动作识别的姿势改进图卷积网络

Pose Refinement Graph Convolutional Network for Skeleton-based Action Recognition

论文作者

Li, Shijie, Yi, Jinhui, Farha, Yazan Abu, Gall, Juergen

论文摘要

随着捕获2D或3D骨架数据的进步，基于骨架的动作识别在过去几年中引起了人们的兴趣。由于骨骼数据通常由图形表示，因此已经为此任务提出了图形卷积网络。尽管当前的图形卷积网络准确地识别了动作，但对于有限的计算资源的机器人应用程序来说，它们太昂贵了。因此，在本文中，我们提出了一个高效的图形卷积网络，该网络解决了先前工作的局限性。这是通过逐渐融合运动和空间信息并尽早减少时间分辨率的平行结构来实现的。此外，我们明确解决了人类姿势可能包含错误的问题。为此，网络首先在进行进一步处理以识别该动作之前先完善姿势。因此，我们将网络姿势改进图卷积网络称为。与其他图形卷积网络相比，我们的网络需要86 \％-93 \％的参数少，并将浮点操作降低89％-96％，同时达到可比的精度。因此，它在准确性，内存足迹和处理时间之间提供了更好的权衡，这使其适合机器人应用程序。

With the advances in capturing 2D or 3D skeleton data, skeleton-based action recognition has received an increasing interest over the last years. As skeleton data is commonly represented by graphs, graph convolutional networks have been proposed for this task. While current graph convolutional networks accurately recognize actions, they are too expensive for robotics applications where limited computational resources are available. In this paper, we therefore propose a highly efficient graph convolutional network that addresses the limitations of previous works. This is achieved by a parallel structure that gradually fuses motion and spatial information and by reducing the temporal resolution as early as possible. Furthermore, we explicitly address the issue that human poses can contain errors. To this end, the network first refines the poses before they are further processed to recognize the action. We therefore call the network Pose Refinement Graph Convolutional Network. Compared to other graph convolutional networks, our network requires 86\%-93\% less parameters and reduces the floating point operations by 89%-96% while achieving a comparable accuracy. It therefore provides a much better trade-off between accuracy, memory footprint and processing time, which makes it suitable for robotics applications.

下载PDF全文

下载文献需遵守相关版权规定

论文标题