双屏蔽的自动编码器，用于使用时空骨骼令牌完成的稳健运动捕获

论文标题

双屏蔽的自动编码器，用于使用时空骨骼令牌完成的稳健运动捕获

A Dual-Masked Auto-Encoder for Robust Motion Capture with Spatial-Temporal Skeletal Token Completion

论文作者

Jiang, Junkun, Chen, Jie, Guo, Yike

论文摘要

由于严重的阻塞，快速身体运动和复杂的相互作用引起的歧义，多人运动捕获可能是具有挑战性的。现有的框架以2D姿势估算为基础，并通过推理多相机观测值的外观，轨迹和几何一致性来对3D坐标进行三角测量。但是，由于观察角有限，2D联合检测通常不完整，并且由于观察角有限而导致错误的身份分配，这会导致噪声3D三角测量结果。为了克服这个问题，我们建议使用变压器探索骨骼运动的短距离自回归特征。首先，我们提出了一个自适应，身份感知的三角剖分模块，以重建3D关节并确定每个身份的缺失关节。为了产生完整的3D骨骼运动，我们提出了一个双掩模的自动编码器（D-MAE），该自动编码器（D-MAE）用骨骼结构和时间位置编码轨迹完成的骨骼结构和时间位置编码关节状态。 D-MAE的灵活掩蔽和编码机制使任意骨架定义可以方便地在同一框架下部署。为了证明所提出的模型在处理严重数据丢失方案方面的能力，我们为多人相互作用与严重闭塞的高临界性和具有挑战性的运动捕获数据集。对基准和我们的新数据集的评估都证明了我们提出的模型的效率，以及对其他最新方法的优势。

Multi-person motion capture can be challenging due to ambiguities caused by severe occlusion, fast body movement, and complex interactions. Existing frameworks build on 2D pose estimations and triangulate to 3D coordinates via reasoning the appearance, trajectory, and geometric consistencies among multi-camera observations. However, 2D joint detection is usually incomplete and with wrong identity assignments due to limited observation angle, which leads to noisy 3D triangulation results. To overcome this issue, we propose to explore the short-range autoregressive characteristics of skeletal motion using transformer. First, we propose an adaptive, identity-aware triangulation module to reconstruct 3D joints and identify the missing joints for each identity. To generate complete 3D skeletal motion, we then propose a Dual-Masked Auto-Encoder (D-MAE) which encodes the joint status with both skeletal-structural and temporal position encoding for trajectory completion. D-MAE's flexible masking and encoding mechanism enable arbitrary skeleton definitions to be conveniently deployed under the same framework. In order to demonstrate the proposed model's capability in dealing with severe data loss scenarios, we contribute a high-accuracy and challenging motion capture dataset of multi-person interactions with severe occlusion. Evaluations on both benchmark and our new dataset demonstrate the efficiency of our proposed model, as well as its advantage against the other state-of-the-art methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题