扩散：使用扩散模型的多种假设人姿势估计

论文标题

扩散：使用扩散模型的多种假设人姿势估计

DiffPose: Multi-hypothesis Human Pose Estimation using Diffusion models

论文作者

Holmquist, Karl, Wandt, Bastian

论文摘要

传统上，单眼3D人姿势估计采用机器学习模型来预测给定输入图像最可能的3D姿势。但是，单个图像可能是高度模棱两可的，并为2d-3d举起步骤诱导多个合理的解决方案，从而导致过度自信的3D姿势预测因子。为此，我们提出了一个条件扩散模型\ emph {diffpose}，它可以预测给定输入图像的多个假设。与类似方法相比，我们的扩散模型很简单，避免了密集的超参数调整，复杂的网络结构，模式崩溃和不稳定的训练。此外，我们解决了共同两步方法的问题，该方法首先通过联合热图估算了2D联合位置的分布，并根据第一或第二矩统计数据近似地近似它们。由于热图的这种简化可以消除有关可能正确的有效信息，尽管不太可能标记为联合位置，因此我们建议将热图表示为一组2D关节候选样品。为了从这些样本中提取有关原始分布的信息，我们介绍了条件扩散模型的\ emph {嵌入变压器}。在实验上，我们表明，对于简单姿势的多种假设姿势估计，对技术的差异略有改进，并以极大的含糊范围胜过它。

Traditionally, monocular 3D human pose estimation employs a machine learning model to predict the most likely 3D pose for a given input image. However, a single image can be highly ambiguous and induces multiple plausible solutions for the 2D-3D lifting step which results in overly confident 3D pose predictors. To this end, we propose \emph{DiffPose}, a conditional diffusion model, that predicts multiple hypotheses for a given input image. In comparison to similar approaches, our diffusion model is straightforward and avoids intensive hyperparameter tuning, complex network structures, mode collapse, and unstable training. Moreover, we tackle a problem of the common two-step approach that first estimates a distribution of 2D joint locations via joint-wise heatmaps and consecutively approximates them based on first- or second-moment statistics. Since such a simplification of the heatmaps removes valid information about possibly correct, though labeled unlikely, joint locations, we propose to represent the heatmaps as a set of 2D joint candidate samples. To extract information about the original distribution from these samples we introduce our \emph{embedding transformer} that conditions the diffusion model. Experimentally, we show that DiffPose slightly improves upon the state of the art for multi-hypothesis pose estimation for simple poses and outperforms it by a large margin for highly ambiguous poses.

下载PDF全文

下载文献需遵守相关版权规定

论文标题