论文标题
极性形式:使用极性变压器的多相机3D对象检测
PolarFormer: Multi-camera 3D Object Detection with Polar Transformer
论文作者
论文摘要
自动驾驶中的3D对象检测旨在推理3D世界中感兴趣的对象的“什么”和“何处”。遵循先前2D对象检测的常规智慧,现有方法通常采用垂直轴的典型笛卡尔坐标系。但是,我们共轭这并不符合自我汽车的观点的本质,因为每个板载摄像头都以自由基(非垂体)轴的成像几何形状感知到楔形的楔形世界。因此,在本文中,我们主张对极性坐标系的开发,并提出一个新的极性变压器(极性形式),以在鸟类的眼视图中更准确的3D对象检测(BEV),仅作为输入仅作为输入多型相机2D图像。具体而言,我们设计了一个基于跨注意的极性检测头,而无需限制输入结构的形状以处理不规则的极性网格。为了解决沿极性距离维度的不受约束的物体量表变化,我们进一步引入了多个层状表示策略。结果,我们的模型可以通过序列到序列的方式来最佳利用极性表示,以几何约束。对Nuscenes数据集进行的彻底实验表明,我们的极性形式的表现明显优于最新的3D对象检测替代方案。
3D object detection in autonomous driving aims to reason "what" and "where" the objects of interest present in a 3D world. Following the conventional wisdom of previous 2D object detection, existing methods often adopt the canonical Cartesian coordinate system with perpendicular axis. However, we conjugate that this does not fit the nature of the ego car's perspective, as each onboard camera perceives the world in shape of wedge intrinsic to the imaging geometry with radical (non-perpendicular) axis. Hence, in this paper we advocate the exploitation of the Polar coordinate system and propose a new Polar Transformer (PolarFormer) for more accurate 3D object detection in the bird's-eye-view (BEV) taking as input only multi-camera 2D images. Specifically, we design a cross attention based Polar detection head without restriction to the shape of input structure to deal with irregular Polar grids. For tackling the unconstrained object scale variations along Polar's distance dimension, we further introduce a multi-scalePolar representation learning strategy. As a result, our model can make best use of the Polar representation rasterized via attending to the corresponding image observation in a sequence-to-sequence fashion subject to the geometric constraints. Thorough experiments on the nuScenes dataset demonstrate that our PolarFormer outperforms significantly state-of-the-art 3D object detection alternatives.