使用多视图和多日期卫星图像和嘈杂的OSM培训标签的大区域地理区域的语义标记

论文标题

使用多视图和多日期卫星图像和嘈杂的OSM培训标签的大区域地理区域的语义标记

Semantic Labeling of Large-Area Geographic Regions Using Multi-View and Multi-Date Satellite Images and Noisy OSM Training Labels

论文作者

Comandur, Bharath, Kak, Avinash C.

论文摘要

我们提出了一个新颖的多视图培训框架和CNN体系结构，用于结合来自OpenStreetMap（OSM）的多个重叠卫星图像和嘈杂的训练标签的信息，到跨大型地理区域（100 km $^2 $）的大型地理区域的语义标签建筑物和道路。与传统的方法相比，我们对多层次的IOU得分的方法提高了4-7％，与彼此独立使用观点相比，我们的级别IOU得分提高了4-7％。我们系统的一个独特的（也许是令人惊讶的）属性是，可以在推断时丢弃CNN中添加到CNN的尾端中的修改，并在整体绩效中的惩罚相对较小。这意味着使用多个视图的训练的好处被网络的所有层所吸收。此外，即使每个场景的训练多达32次观看次数，我们的方法也只会在GPU记忆消耗方面增加一个小开销。我们提出的系统是端到端自动化的，这有助于比较直接在真实的正踪性中训练的分类器，请先对其进行训练，然后将其在北极图像上进行训练，然后将预测的标签转换为地理坐标。由于没有人类的监督，我们对建筑物和道路类别的IOU得分分别为0.8和0.64，比使用OSM标签且不完全自动化的最先进方法要好。

We present a novel multi-view training framework and CNN architecture for combining information from multiple overlapping satellite images and noisy training labels derived from OpenStreetMap (OSM) to semantically label buildings and roads across large geographic regions (100 km$^2$). Our approach to multi-view semantic segmentation yields a 4-7% improvement in the per-class IoU scores compared to the traditional approaches that use the views independently of one another. A unique (and, perhaps, surprising) property of our system is that modifications that are added to the tail-end of the CNN for learning from the multi-view data can be discarded at the time of inference with a relatively small penalty in the overall performance. This implies that the benefits of training using multiple views are absorbed by all the layers of the network. Additionally, our approach only adds a small overhead in terms of the GPU-memory consumption even when training with as many as 32 views per scene. The system we present is end-to-end automated, which facilitates comparing the classifiers trained directly on true orthophotos vis-a-vis first training them on the off-nadir images and subsequently translating the predicted labels to geographical coordinates. With no human supervision, our IoU scores for the buildings and roads classes are 0.8 and 0.64 respectively which are better than state-of-the-art approaches that use OSM labels and that are not completely automated.

下载PDF全文

下载文献需遵守相关版权规定

论文标题