论文标题
基于小组骨骼的人类行动识别复杂事件
Group-Skeleton-Based Human Action Recognition in Complex Events
论文作者
论文摘要
数十年来,已经研究了人类行动识别作为计算机视觉的重要应用。在各种方法中,基于骨架的方法最近由于其稳健性和出色的性能而引起了越来越多的关注。但是,现有的基于骨架的方法忽略了不同人之间的潜在行动关系,而一个人的行动很可能会受到另一个人的影响,尤其是在复杂的事件中。在本文中,我们在复杂事件中提出了一种基于组骨骼的新型人类动作识别方法。该方法首先利用多尺度的时空图卷积网络(MS-G3DS)从多个人中提取骨架特征。除了传统的关键点坐标外,我们还将关键点速度值输入网络以提高性能。然后,我们使用多层感知器(MLP)将参考人员和其他人之间的距离值嵌入到提取的特征中。最后,将所有功能馈入另一个MS-G3D进行特征融合和分类。为了避免阶级不平衡问题,网络受到焦点损失的训练。提出的算法也是我们在复杂事件挑战中进行大规模以人为中心的视频分析的解决方案。 Hieve数据集中的结果表明,与其他最先进的方法相比,我们的方法可以提供出色的性能。
Human action recognition as an important application of computer vision has been studied for decades. Among various approaches, skeleton-based methods recently attract increasing attention due to their robust and superior performance. However, existing skeleton-based methods ignore the potential action relationships between different persons, while the action of a person is highly likely to be impacted by another person especially in complex events. In this paper, we propose a novel group-skeleton-based human action recognition method in complex events. This method first utilizes multi-scale spatial-temporal graph convolutional networks (MS-G3Ds) to extract skeleton features from multiple persons. In addition to the traditional key point coordinates, we also input the key point speed values to the networks for better performance. Then we use multilayer perceptrons (MLPs) to embed the distance values between the reference person and other persons into the extracted features. Lastly, all the features are fed into another MS-G3D for feature fusion and classification. For avoiding class imbalance problems, the networks are trained with a focal loss. The proposed algorithm is also our solution for the Large-scale Human-centric Video Analysis in Complex Events Challenge. Results on the HiEve dataset show that our method can give superior performance compared to other state-of-the-art methods.