论文标题
链式表示骑自行车:学会通过骑自行车在表示形式之间估算3D人类姿势和形状
Chained Representation Cycling: Learning to Estimate 3D Human Pose and Shape by Cycling Between Representations
论文作者
论文摘要
许多计算机视觉系统的目标是将图像像素转换为3D表示。最近的流行模型使用神经网络直接从像素回归到3D对象参数。当可以进行监督时,这种方法很好地效果很好,但是在人类姿势和形状估计等问题中,很难获得具有3D地面真相的自然图像。为了更进一步,我们提出了一种新的体系结构,以促进无监督或轻松监督的学习。这个想法是将问题分解为日益抽象的表示之间的一系列转换。每个步骤都涉及一个旨在在没有注释的训练数据的情况下学习的周期,并且循环链提供了最终解决方案。具体而言,我们使用2D身体部位段作为一个中间表示,其中包含足够的信息以提升至3D,同时很简单,可以以无监督的方式学习。我们通过从未划分和未注销的图像中学习3D人姿势和形状来证明该方法。我们还探索了不同数量的配对数据,并表明骑自行车大大减轻了对配对数据的需求。当我们提出对人类建模的结果时,我们的表述是一般的,可以应用于其他视力问题。
The goal of many computer vision systems is to transform image pixels into 3D representations. Recent popular models use neural networks to regress directly from pixels to 3D object parameters. Such an approach works well when supervision is available, but in problems like human pose and shape estimation, it is difficult to obtain natural images with 3D ground truth. To go one step further, we propose a new architecture that facilitates unsupervised, or lightly supervised, learning. The idea is to break the problem into a series of transformations between increasingly abstract representations. Each step involves a cycle designed to be learnable without annotated training data, and the chain of cycles delivers the final solution. Specifically, we use 2D body part segments as an intermediate representation that contains enough information to be lifted to 3D, and at the same time is simple enough to be learned in an unsupervised way. We demonstrate the method by learning 3D human pose and shape from un-paired and un-annotated images. We also explore varying amounts of paired data and show that cycling greatly alleviates the need for paired data. While we present results for modeling humans, our formulation is general and can be applied to other vision problems.