安培：3D人姿势估计的交替混合全球本地注意模型

论文标题

安培：3D人姿势估计的交替混合全球本地注意模型

AMPose: Alternately Mixed Global-Local Attention Model for 3D Human Pose Estimation

论文作者

Lin, Hongxin, Chiu, Yunwei, Wu, Peiyuan

论文摘要

图形卷积网络（GCN）已应用于对人类关节之间的物理联系和非本地关系进行建模，以进行3D人体姿势估计（HPE）。此外，纯粹的基于变压器的模型最近在基于视频的3D HPE中显示出有希望的结果。但是，单帧方法仍然需要模拟关节之间的物理联系关系，因为功能表示仅通过全球关系通过有关人类骨架的全球关系转换。为了解决这个问题，我们提出了一种新颖的方法，其中变压器编码器和GCN块交替堆叠，即安培，以结合关节与HPE的全局和物理联系。在Ampose中，使用变压器编码器将每个关节与所有其他关节连接起来，而GCN则用于捕获有关物理连接关系的信息。我们提出的方法的有效性在人类360万数据集上进行了评估。我们的模型还通过在MPI-INF-3DHP数据集上进行测试来显示出更好的概括能力。代码可以在https://github.com/erikervalid/ampose上检索。

The graph convolutional networks (GCNs) have been applied to model the physically connected and non-local relations among human joints for 3D human pose estimation (HPE). In addition, the purely Transformer-based models recently show promising results in video-based 3D HPE. However, the single-frame method still needs to model the physically connected relations among joints because the feature representations transformed only by global relations via the Transformer neglect information on the human skeleton. To deal with this problem, we propose a novel method in which the Transformer encoder and GCN blocks are alternately stacked, namely AMPose, to combine the global and physically connected relations among joints towards HPE. In the AMPose, the Transformer encoder is applied to connect each joint with all the other joints, while GCNs are applied to capture information on physically connected relations. The effectiveness of our proposed method is evaluated on the Human3.6M dataset. Our model also shows better generalization ability by testing on the MPI-INF-3DHP dataset. Code can be retrieved at https://github.com/erikervalid/AMPose.

下载PDF全文

下载文献需遵守相关版权规定

论文标题