空中单眼3D对象检测

论文标题

空中单眼3D对象检测

Aerial Monocular 3D Object Detection

论文作者

Hu, Yue, Fang, Shaoheng, Xie, Weidi, Chen, Siheng

论文摘要

配备摄像机的无人机可以显着增强人类在3D空间中具有显着的可操作性，从而使人类感知世界的能力。具有讽刺意味的是，无人机的对象检测始终是在2D图像空间中进行的，这从根本上限制了其理解3D场景的能力。此外，由于缺乏变形建模，现有的用于自动驾驶的3D对象检测方法不能直接应用于无人机，这对于具有敏感变形和小物体的远处空气透视至关重要。为了填补空白，这项工作提出了一个名为DVDET的双视检测系统，以在2D图像空间和3D物理空间中实现空中单眼对象检测。为了解决严重的视图变形问题，我们提出了一个可训练的可训练的可训练的转换模块，该模块可以从无人机的角度正确地扭曲信息到BEV。与汽车的单眼方法相比，我们的转换包括一个可学习的可变形网络，可显式修改严重的偏差。为了应对数据集挑战，我们提出了一个名为AM3D-SIM的新的大型模拟数据集，该数据集由AirSim和Carla的共模制成，以及一个名为AM3D-REAL的新的现实世界空中数据集，由DJI Matrice 300 RTK收集，在两个数据集中，在两个数据集中，高质量ant，高质量的对象intection in 3D DD对象可检测。广泛的实验表明，i）空中单眼3D对象检测是可行的； ii）在仿真数据集中预先训练的模型受益于现实世界的性能，而iii）DVDET也有益于汽车的单程3D对象检测。为了鼓励更多的研究人员调查该领域，我们将在https://github.com/phyllish/dvdet中发布数据集和相关代码。

Drones equipped with cameras can significantly enhance human ability to perceive the world because of their remarkable maneuverability in 3D space. Ironically, object detection for drones has always been conducted in the 2D image space, which fundamentally limits their ability to understand 3D scenes. Furthermore, existing 3D object detection methods developed for autonomous driving cannot be directly applied to drones due to the lack of deformation modeling, which is essential for the distant aerial perspective with sensitive distortion and small objects. To fill the gap, this work proposes a dual-view detection system named DVDET to achieve aerial monocular object detection in both the 2D image space and the 3D physical space. To address the severe view deformation issue, we propose a novel trainable geo-deformable transformation module that can properly warp information from the drone's perspective to the BEV. Compared to the monocular methods for cars, our transformation includes a learnable deformable network for explicitly revising the severe deviation. To address the dataset challenge, we propose a new large-scale simulation dataset named AM3D-Sim, generated by the co-simulation of AirSIM and CARLA, and a new real-world aerial dataset named AM3D-Real, collected by DJI Matrice 300 RTK, in both datasets, high-quality annotations for 3D object detection are provided. Extensive experiments show that i) aerial monocular 3D object detection is feasible; ii) the model pre-trained on the simulation dataset benefits real-world performance, and iii) DVDET also benefits monocular 3D object detection for cars. To encourage more researchers to investigate this area, we will release the dataset and related code in https://github.com/PhyllisH/DVDET.

下载PDF全文

下载文献需遵守相关版权规定

论文标题