论文标题
PETRV2:多摄像机图像的3D感知统一框架
PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images
论文作者
论文摘要
在本文中,我们提出了PETRV2,这是一个来自多视图图像的3D感知的统一框架。基于PETR,PETRV2探讨了时间建模的有效性,该时间建模利用先前帧的时间信息来增强3D对象检测。更具体地说,我们扩展了PETR中的3D位置嵌入(3D PE)进行时间建模。 3D PE可以在不同帧的对象位置上实现时间对齐。进一步引入了特征引导的位置编码器,以提高3D PE的数据适应性。为了支持多任务学习(例如,BEV分割和3D车道检测),PETRV2通过引入特定于任务的查询提供了一个简单而有效的解决方案,该查询在不同的空间下初始化。 PETRV2在3D对象检测,BEV分割和3D车道检测方面实现了最先进的性能。在PETR框架上还进行了详细的鲁棒性分析。我们希望PETRV2可以作为3D知觉的强大基准。代码可在\ url {https://github.com/megvii-research/petr}中获得。
In this paper, we propose PETRv2, a unified framework for 3D perception from multi-view images. Based on PETR, PETRv2 explores the effectiveness of temporal modeling, which utilizes the temporal information of previous frames to boost 3D object detection. More specifically, we extend the 3D position embedding (3D PE) in PETR for temporal modeling. The 3D PE achieves the temporal alignment on object position of different frames. A feature-guided position encoder is further introduced to improve the data adaptability of 3D PE. To support for multi-task learning (e.g., BEV segmentation and 3D lane detection), PETRv2 provides a simple yet effective solution by introducing task-specific queries, which are initialized under different spaces. PETRv2 achieves state-of-the-art performance on 3D object detection, BEV segmentation and 3D lane detection. Detailed robustness analysis is also conducted on PETR framework. We hope PETRv2 can serve as a strong baseline for 3D perception. Code is available at \url{https://github.com/megvii-research/PETR}.