基于无监督骨架的动作学习的全局本地运动变压器

论文标题

基于无监督骨架的动作学习的全局本地运动变压器

Global-local Motion Transformer for Unsupervised Skeleton-based Action Learning

论文作者

Kim, Boeun, Chang, Hyung Jin, Kim, Jungho, Choi, Jin Young

论文摘要

我们提出了一个新的变压器模型，用于无监督学习骨架运动序列的任务。用于基于无监督骨骼的动作学习的现有变压器模型学习了每个关节从相邻帧的瞬时速度，而没有全局运动信息。因此，该模型在全身运动和暂时遥远的关节上学习全球的注意力很难。此外，在模型中尚未考虑人与人之间的互动。为了解决全身运动，远程时间动态和人与人之间的互动的学习，我们设计了一种全球和本地的注意机制，在其中，全球身体动作和本地关节动作相互关注。此外，我们提出了一种新颖的预处理策略，即多间隔姿势位移预测，以在不同的时间范围内学习全球和本地关注。提出的模型成功地学习了关节的局部动力学，并从运动序列捕获了全局上下文。我们的模型优于代表性基准中明显边缘的最先进模型。代码可在https://github.com/boeun-kim/gl-transformer上找到。

We propose a new transformer model for the task of unsupervised learning of skeleton motion sequences. The existing transformer model utilized for unsupervised skeleton-based action learning is learned the instantaneous velocity of each joint from adjacent frames without global motion information. Thus, the model has difficulties in learning the attention globally over whole-body motions and temporally distant joints. In addition, person-to-person interactions have not been considered in the model. To tackle the learning of whole-body motion, long-range temporal dynamics, and person-to-person interactions, we design a global and local attention mechanism, where, global body motions and local joint motions pay attention to each other. In addition, we propose a novel pretraining strategy, multi-interval pose displacement prediction, to learn both global and local attention in diverse time ranges. The proposed model successfully learns local dynamics of the joints and captures global context from the motion sequences. Our model outperforms state-of-the-art models by notable margins in the representative benchmarks. Codes are available at https://github.com/Boeun-Kim/GL-Transformer.

下载PDF全文

下载文献需遵守相关版权规定

论文标题