论文标题

扩散:使用扩散模型的多种假设人姿势估计

DiffPose: Multi-hypothesis Human Pose Estimation using Diffusion models

论文作者

Holmquist, Karl, Wandt, Bastian

论文摘要

传统上,单眼3D人姿势估计采用机器学习模型来预测给定输入图像最可能的3D姿势。但是,单个图像可能是高度模棱两可的,并为2d-3d举起步骤诱导多个合理的解决方案,从而导致过度自信的3D姿势预测因子。为此,我们提出了一个条件扩散模型\ emph {diffpose},它可以预测给定输入图像的多个假设。与类似方法相比,我们的扩散模型很简单,避免了密集的超参数调整,复杂的网络结构,模式崩溃和不稳定的训练。此外,我们解决了共同两步方法的问题,该方法首先通过联合热图估算了2D联合位置的分布,并根据第一或第二矩统计数据近似地近似它们。由于热图的这种简化可以消除有关可能正确的有效信息,尽管不太可能标记为联合位置,因此我们建议将热图表示为一组2D关节候选样品。为了从这些样本中提取有关原始分布的信息,我们介绍了条件扩散模型的\ emph {嵌入变压器}。在实验上,我们表明,对于简单姿势的多种假设姿势估计,对技术的差异略有改进,并以极大的含糊范围胜过它。

Traditionally, monocular 3D human pose estimation employs a machine learning model to predict the most likely 3D pose for a given input image. However, a single image can be highly ambiguous and induces multiple plausible solutions for the 2D-3D lifting step which results in overly confident 3D pose predictors. To this end, we propose \emph{DiffPose}, a conditional diffusion model, that predicts multiple hypotheses for a given input image. In comparison to similar approaches, our diffusion model is straightforward and avoids intensive hyperparameter tuning, complex network structures, mode collapse, and unstable training. Moreover, we tackle a problem of the common two-step approach that first estimates a distribution of 2D joint locations via joint-wise heatmaps and consecutively approximates them based on first- or second-moment statistics. Since such a simplification of the heatmaps removes valid information about possibly correct, though labeled unlikely, joint locations, we propose to represent the heatmaps as a set of 2D joint candidate samples. To extract information about the original distribution from these samples we introduce our \emph{embedding transformer} that conditions the diffusion model. Experimentally, we show that DiffPose slightly improves upon the state of the art for multi-hypothesis pose estimation for simple poses and outperforms it by a large margin for highly ambiguous poses.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源