卫星视频中的深度车辆检测

论文标题

卫星视频中的深度车辆检测

Deep Vehicle Detection in Satellite Video

论文作者

Pflugfelder, Roman, Weissenfeld, Axel, Wagner, Julian

论文摘要

这项工作为卫星视频中的车辆检测提供了一种深度学习方法。由于车辆的微小（4-10像素）及其与背景的相似性，因此在单个EO卫星图像中可能不可能进行车辆检测。取而代之的是，我们考虑卫星视频，该视频克服了由于车辆运动的时间一致性而缺乏空间信息。提出了一种紧凑型$ 3 $ 3 $卷积的神经网络的新时空模型，该模型忽略了合并层并使用泄漏的保留。然后，我们使用输出热图的重新制定，包括最终分割的非最大抑制（NMS）。两个新的带注释的卫星视频的经验结果重新确认了这种方法用于车辆检测的适用性。他们更重要的是表明，对WAMI数据进行预训练，然后在几个带注释的视频帧上进行微调以进行新视频就足够了。在我们的实验中，只有五个带注释的图像在新视频中产生的$ F_1 $得分为0.81，该视频显示出比拉斯维加斯视频更复杂的流量模式。我们对拉斯维加斯的最佳结果是$ F_1 $得分为0.87，这使得拟议的方法成为该基准的领先方法。

This work presents a deep learning approach for vehicle detection in satellite video. Vehicle detection is perhaps impossible in single EO satellite images due to the tininess of vehicles (4-10 pixel) and their similarity to the background. Instead, we consider satellite video which overcomes the lack of spatial information by temporal consistency of vehicle movement. A new spatiotemporal model of a compact $3 \times 3$ convolutional, neural network is proposed which neglects pooling layers and uses leaky ReLUs. Then we use a reformulation of the output heatmap including Non-Maximum-Suppression (NMS) for the final segmentation. Empirical results on two new annotated satellite videos reconfirm the applicability of this approach for vehicle detection. They more importantly indicate that pre-training on WAMI data and then fine-tuning on few annotated video frames for a new video is sufficient. In our experiment only five annotated images yield a $F_1$ score of 0.81 on a new video showing more complex traffic patterns than the Las Vegas video. Our best result on Las Vegas is a $F_1$ score of 0.87 which makes the proposed approach a leading method for this benchmark.

下载PDF全文

下载文献需遵守相关版权规定

论文标题