BevSegformer：由任意摄像机钻机的Bird's Eye View语义分割

论文标题

BevSegformer：由任意摄像机钻机的Bird's Eye View语义分割

BEVSegFormer: Bird's Eye View Semantic Segmentation From Arbitrary Camera Rigs

论文作者

Peng, Lang, Chen, Zhirong, Fu, Zhangjie, Liang, Pengpeng, Cheng, Erkang

论文摘要

伯德眼景（BEV）中的语义细分是自动驾驶的重要任务。尽管这项任务吸引了大量的研究工作，但在自动驾驶汽车上灵活应对任意（单个或多个）摄像头传感器仍然具有挑战性。在本文中，我们介绍了BevSegformer，这是一种有效的基于变压器的方法，用于从任意摄像机钻机中进行BEV语义分割。具体而言，我们的方法首先用具有共享骨架的任意摄像机中编码图像功能。然后，这些图像特征将通过基于变压器的编码器增强。此外，我们引入了BEV变压器解码器模块以解析BEV语义分割结果。有效的多相机可变形注意单元旨在进行BEV-to-to-image视图变换。最后，查询是根据BEV中网格的布局重塑的，并以监督的方式进行了更大的采样以产生语义分割结果。我们在公共Nuscenes数据集和自收集的数据集上评估了所提出的算法。实验结果表明，我们的方法可以通过任意摄像机钻机在BEV语义分割方面实现有希望的性能。我们还通过消融研究证明了每个组件的有效性。

Semantic segmentation in bird's eye view (BEV) is an important task for autonomous driving. Though this task has attracted a large amount of research efforts, it is still challenging to flexibly cope with arbitrary (single or multiple) camera sensors equipped on the autonomous vehicle. In this paper, we present BEVSegFormer, an effective transformer-based method for BEV semantic segmentation from arbitrary camera rigs. Specifically, our method first encodes image features from arbitrary cameras with a shared backbone. These image features are then enhanced by a deformable transformer-based encoder. Moreover, we introduce a BEV transformer decoder module to parse BEV semantic segmentation results. An efficient multi-camera deformable attention unit is designed to carry out the BEV-to-image view transformation. Finally, the queries are reshaped according the layout of grids in the BEV, and upsampled to produce the semantic segmentation result in a supervised manner. We evaluate the proposed algorithm on the public nuScenes dataset and a self-collected dataset. Experimental results show that our method achieves promising performance on BEV semantic segmentation from arbitrary camera rigs. We also demonstrate the effectiveness of each component via ablation study.

下载PDF全文

下载文献需遵守相关版权规定

论文标题