通过3D事件点云有效估计的有效人姿势估计

论文标题

通过3D事件点云有效估计的有效人姿势估计

Efficient Human Pose Estimation via 3D Event Point Cloud

论文作者

Chen, Jiaan, Shi, Hao, Ye, Yaozu, Yang, Kailun, Sun, Lei, Wang, Kaiwei

论文摘要

基于RGB图像的人类姿势估计（HPE）经历了从深度学习中受益的快速发展。但是，基于事件的HPE尚未得到充分研究，这仍然是在极端场景和关键效率条件下应用的巨大潜力。在本文中，我们是第一个直接从3D事件点云中估算2D人类姿势的人。我们提出了一个新颖的事件表示，即栅格的事件点云，将事件汇总在小时切片的相同位置上。它维护了来自多个统计提示的3D功能，并显着降低了记忆消耗和计算复杂性，这在我们的工作中是有效的。然后，我们利用两种不同的骨干，点网，DGCNN和点变压器来利用栅格化事件点云，并使用两个线性层解码器来预测人关键点的位置。我们发现，基于我们的方法，PointNet以更快的速度实现了有希望的结果，而Point Transfomer的精度也更高，甚至接近以前的基于事件框架的方法。一组全面的结果表明，在事件驱动的人姿势估计中，我们提出的方法对这些3D主干模型始终有效。我们基于2048点输入的PointNet的方法在DHP19数据集的MPJPE3D中实现了82.46mm，而在NVIDIA JETSON XAVIER NX EDGE Computing平台上仅具有12.29ms的延迟，非常适合与Event Cameras实时检测。代码可从https://github.com/masterhow/eventpointpose获得。

Human Pose Estimation (HPE) based on RGB images has experienced a rapid development benefiting from deep learning. However, event-based HPE has not been fully studied, which remains great potential for applications in extreme scenes and efficiency-critical conditions. In this paper, we are the first to estimate 2D human pose directly from 3D event point cloud. We propose a novel representation of events, the rasterized event point cloud, aggregating events on the same position of a small time slice. It maintains the 3D features from multiple statistical cues and significantly reduces memory consumption and computation complexity, proved to be efficient in our work. We then leverage the rasterized event point cloud as input to three different backbones, PointNet, DGCNN, and Point Transformer, with two linear layer decoders to predict the location of human keypoints. We find that based on our method, PointNet achieves promising results with much faster speed, whereas Point Transfomer reaches much higher accuracy, even close to previous event-frame-based methods. A comprehensive set of results demonstrates that our proposed method is consistently effective for these 3D backbone models in event-driven human pose estimation. Our method based on PointNet with 2048 points input achieves 82.46mm in MPJPE3D on the DHP19 dataset, while only has a latency of 12.29ms on an NVIDIA Jetson Xavier NX edge computing platform, which is ideally suitable for real-time detection with event cameras. Code is available at https://github.com/MasterHow/EventPointPose.

下载PDF全文

下载文献需遵守相关版权规定

论文标题