自下而上的2D姿势通过双重解剖中心估算小规模人士

论文标题

自下而上的2D姿势通过双重解剖中心估算小规模人士

Bottom-Up 2D Pose Estimation via Dual Anatomical Centers for Small-Scale Persons

论文作者

Cheng, Yu, Ai, Yihao, Wang, Bo, Wang, Xinchao, Tan, Robby T.

论文摘要

在多人2D姿势估计中，自下而上的方法同时预测了所有人的姿势，与自上而下的方法不同，不依赖于人类的检测。但是，与现有的自上而下方法相比，SOTA自下而上的方法的精度仍然不如较低。这是由于预测的人类姿势是根据不一致的人类边界箱中心回归的，并且缺乏人类规范的归一化，从而导致预测的人类姿势被遗漏了不准确和小规模的人。为了推动自下而上的姿势估计的信封，我们首先提出了多尺度培训，以增强网络以通过单尺度测试来处理规模变化，尤其是对于小规模的人。其次，我们介绍了双解剖中心（即头部和身体），在这里我们可以更准确，更可靠地预测人类的姿势，尤其是对于小规模的人。此外，现有的自下而上方法使用多尺度测试以多个额外的前向通行证的价格提高姿势估计的准确性，从而削弱了自下而上方法的效率，而核心强度与自上而下的方法相比。相比之下，我们的多尺度培训使该模型能够预测单个正向通行证（即单尺度测试）中的高质量姿势。我们的方法在边界框的精度方面取得了38.4 \％的改进，在边界框上，对可可的挑战性小规模人士子集进行了边界框的召回（SOTA）。对于人类姿势AP的评估，我们在带有单尺度测试的可可测试-DEV设置上实现了新的SOTA（71.0 AP）。我们还在跨数据库评估中达到了Ochuman数据集上的最高性能（40.3 AP）。

In multi-person 2D pose estimation, the bottom-up methods simultaneously predict poses for all persons, and unlike the top-down methods, do not rely on human detection. However, the SOTA bottom-up methods' accuracy is still inferior compared to the existing top-down methods. This is due to the predicted human poses being regressed based on the inconsistent human bounding box center and the lack of human-scale normalization, leading to the predicted human poses being inaccurate and small-scale persons being missed. To push the envelope of the bottom-up pose estimation, we firstly propose multi-scale training to enhance the network to handle scale variation with single-scale testing, particularly for small-scale persons. Secondly, we introduce dual anatomical centers (i.e., head and body), where we can predict the human poses more accurately and reliably, especially for small-scale persons. Moreover, existing bottom-up methods use multi-scale testing to boost the accuracy of pose estimation at the price of multiple additional forward passes, which weakens the efficiency of bottom-up methods, the core strength compared to top-down methods. By contrast, our multi-scale training enables the model to predict high-quality poses in a single forward pass (i.e., single-scale testing). Our method achieves 38.4\% improvement on bounding box precision and 39.1\% improvement on bounding box recall over the state of the art (SOTA) on the challenging small-scale persons subset of COCO. For the human pose AP evaluation, we achieve a new SOTA (71.0 AP) on the COCO test-dev set with the single-scale testing. We also achieve the top performance (40.3 AP) on OCHuman dataset in cross-dataset evaluation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题