Superyolo：多模式遥感图像中的超级分辨率辅助对象检测

论文标题

Superyolo：多模式遥感图像中的超级分辨率辅助对象检测

SuperYOLO: Super Resolution Assisted Object Detection in Multimodal Remote Sensing Imagery

论文作者

Zhang, Jiaqing, Lei, Jie, Xie, Weiying, Fang, Zhenman, Li, Yunsong, Du, Qian

论文摘要

准确，及时检测包含来自遥感图像（RSI）数十个像素的多尺度小物体仍然具有挑战性。大多数现有解决方案主要设计复杂的深神经网络，以学习与背景分开的物体的强大特征表示，这通常会导致沉重的计算负担。在本文中，我们提出了一种名为Superyolo的RSI的准确而快速的对象检测方法，该方法融合了多模式数据并通过利用辅助超级分辨率（SR）学习并考虑检测准确性和计算成本，对多尺度对象进行高分辨率（HR）对象检测。首先，我们利用对称紧凑的多模式融合（MF）从各种数据中提取补充信息，以改善RSI中的小对象检测。此外，我们设计了一个简单且灵活的SR分支来学习HR特征表示，可以区分具有低分辨率（LR）输入的庞大背景的小物体，从而进一步提高了检测准确性。此外，为避免引入其他计算，SR分支在推理阶段被丢弃，并且由于LR输入而减少了网络模型的计算。实验结果表明，在广泛使用的Vedai RS数据集上，Superyolo的精度为75.09％（在MAP50方面），比SOTA大型模型高10％以上，例如Yolov5l，Yolov5x和RS设计的Yolors。同时，Superyolo的参数大小和GFLOPS比Yolov5x少18次，3.8倍。与最先进的模型相比，我们提出的模型显示出有利的准确性和速度折衷。该代码将在https://github.com/icey-zhang/superyolo上开源。

Accurately and timely detecting multiscale small objects that contain tens of pixels from remote sensing images (RSI) remains challenging. Most of the existing solutions primarily design complex deep neural networks to learn strong feature representations for objects separated from the background, which often results in a heavy computation burden. In this article, we propose an accurate yet fast object detection method for RSI, named SuperYOLO, which fuses multimodal data and performs high-resolution (HR) object detection on multiscale objects by utilizing the assisted super resolution (SR) learning and considering both the detection accuracy and computation cost. First, we utilize a symmetric compact multimodal fusion (MF) to extract supplementary information from various data for improving small object detection in RSI. Furthermore, we design a simple and flexible SR branch to learn HR feature representations that can discriminate small objects from vast backgrounds with low-resolution (LR) input, thus further improving the detection accuracy. Moreover, to avoid introducing additional computation, the SR branch is discarded in the inference stage, and the computation of the network model is reduced due to the LR input. Experimental results show that, on the widely used VEDAI RS dataset, SuperYOLO achieves an accuracy of 75.09% (in terms of mAP50 ), which is more than 10% higher than the SOTA large models, such as YOLOv5l, YOLOv5x, and RS designed YOLOrs. Meanwhile, the parameter size and GFLOPs of SuperYOLO are about 18 times and 3.8 times less than YOLOv5x. Our proposed model shows a favorable accuracy and speed tradeoff compared to the state-of-the-art models. The code will be open-sourced at https://github.com/icey-zhang/SuperYOLO.

下载PDF全文

下载文献需遵守相关版权规定

论文标题