通过人体驱动的注意力单程表达体回归

论文标题

通过人体驱动的注意力单程表达体回归

Monocular Expressive Body Regression through Body-Driven Attention

论文作者

Choutas, Vasileios, Pavlakos, Georgios, Bolkart, Timo, Tzionas, Dimitrios, Black, Michael J.

论文摘要

要了解人们的外观，互动或执行任务，我们需要快速准确地从RGB图像中捕获其3D身体，脸部和手。大多数现有的方法仅着眼于身体的部位。最近使用包括面部和手在内的3D身体模型从图像中重建了一些表达3D人类的方法。这些方法是基于优化的，因此较慢，容易局部Optima，并且需要2D关键点作为输入。我们通过引入暴露（表达姿势和形状回归）来解决这些局限性，该曝光直接从RGB图像中直接以SMPL-X格式回归身体，面部和手。由于身体的高维度和缺乏表现力的训练数据，这是一个严重的问题。此外，手和脸比身体小得多，占用很少的图像像素。当神经网络缩小身体图像时，这会使手和面部估计很难。我们做出三个主要贡献。首先，我们通过策划一个SMPL-X的数据集来解释缺乏培训数据。其次，我们观察到身体估计可以很好地定位面部和手。我们在原始图像中引入了面部和手部区域的身体驱动的注意，以提取被喂入专用改进模块的高分辨率作物。第三，这些模块从现有的面部和手工数据集中利用了特定部分知识。与现有优化方法相比，以一小部分计算成本来表达估计表达3D人类。我们的数据，模型和代码可在https://expose.is.tue.mpg.de上进行研究。

To understand how people look, interact, or perform tasks, we need to quickly and accurately capture their 3D body, face, and hands together from an RGB image. Most existing methods focus only on parts of the body. A few recent approaches reconstruct full expressive 3D humans from images using 3D body models that include the face and hands. These methods are optimization-based and thus slow, prone to local optima, and require 2D keypoints as input. We address these limitations by introducing ExPose (EXpressive POse and Shape rEgression), which directly regresses the body, face, and hands, in SMPL-X format, from an RGB image. This is a hard problem due to the high dimensionality of the body and the lack of expressive training data. Additionally, hands and faces are much smaller than the body, occupying very few image pixels. This makes hand and face estimation hard when body images are downscaled for neural networks. We make three main contributions. First, we account for the lack of training data by curating a dataset of SMPL-X fits on in-the-wild images. Second, we observe that body estimation localizes the face and hands reasonably well. We introduce body-driven attention for face and hand regions in the original image to extract higher-resolution crops that are fed to dedicated refinement modules. Third, these modules exploit part-specific knowledge from existing face- and hand-only datasets. ExPose estimates expressive 3D humans more accurately than existing optimization methods at a small fraction of the computational cost. Our data, model and code are available for research at https://expose.is.tue.mpg.de .

下载PDF全文

下载文献需遵守相关版权规定

论文标题