野外3D人姿势预测的基于通用扩散的方法

论文标题

野外3D人姿势预测的基于通用扩散的方法

A generic diffusion-based approach for 3D human pose prediction in the wild

论文作者

Saadatnejad, Saeed, Rasekh, Ali, Mofayezi, Mohammadreza, Medghalchi, Yasamin, Rajabzadeh, Sara, Mordan, Taylor, Alahi, Alexandre

论文摘要

在现实世界中预测3D人类姿势，也称为人姿势预测，不可避免地会受到不准确的3D姿势估计和遮挡引起的嘈杂输入。为了应对这些挑战，我们提出了一种基于扩散的方法，可以预测给定嘈杂的观察。我们将预测任务构架为一个剥落的问题，在该问题中，观察和预测都被视为包含缺失元素的单个序列（无论是在观察还是预测范围内）。所有缺失的元素都被视为噪声，并用我们的条件扩散模型对其进行了辩护。为了更好地处理长期预测范围，我们提出了一个时间级联扩散模型。我们在四个公开可用的数据集（Human 36M，Humaneva-I，Amass和3DPW）上证明了方法的好处，表现优于最先进的。此外，我们表明我们的框架足够通用，可以改善任何3D姿势预测模型，作为修复其输入的预处理步骤，并在进行后处理步骤，以完善其输出。该代码可在线获得：\ url {https://github.com/vita-epfl/deposit}。

Predicting 3D human poses in real-world scenarios, also known as human pose forecasting, is inevitably subject to noisy inputs arising from inaccurate 3D pose estimations and occlusions. To address these challenges, we propose a diffusion-based approach that can predict given noisy observations. We frame the prediction task as a denoising problem, where both observation and prediction are considered as a single sequence containing missing elements (whether in the observation or prediction horizon). All missing elements are treated as noise and denoised with our conditional diffusion model. To better handle long-term forecasting horizon, we present a temporal cascaded diffusion model. We demonstrate the benefits of our approach on four publicly available datasets (Human3.6M, HumanEva-I, AMASS, and 3DPW), outperforming the state-of-the-art. Additionally, we show that our framework is generic enough to improve any 3D pose prediction model as a pre-processing step to repair their inputs and a post-processing step to refine their outputs. The code is available online: \url{https://github.com/vita-epfl/DePOSit}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题