更多样化意味着更好：多模式深度学习符合遥感图像分类

论文标题

更多样化意味着更好：多模式深度学习符合遥感图像分类

More Diverse Means Better: Multimodal Deep Learning Meets Remote Sensing Imagery Classification

论文作者

Hong, Danfeng, Gao, Lianru, Yokoya, Naoto, Yao, Jing, Chanussot, Jocelyn, Du, Qian, Zhang, Bing

论文摘要

在地球科学和遥感（RS）中，对地球表面上或以下材料的分类和识别一直是一个基本但具有挑战性的研究主题，并且由于深度学习技术的最新进步，人们越来越担心。尽管深层网络已成功地应用于单模式为主导的分类任务中，但是由于信息多样性的限制，它们的性能不可避免地在需要细分的复杂场景中符合瓶颈。在这项工作中，我们通过开发一般的多模式深度学习（MDL）框架为上述难度提供了基线解决方案。特别是，我们还研究了一种多模式学习的特殊情况（MML） - 在RS图像分类应用中广泛存在的跨模式学习（CML）。通过专注于“什么”，“哪里”和“如何融合”，我们展示了不同的融合策略以及如何训练深层网络并构建网络体系结构。具体而言，引入和开发了五个融合体系结构，在我们的MDL框架中进一步统一。更重要的是，我们的框架不仅限于像素分类任务，而且还适用于使用卷积神经网络（CNN）的空间信息建模。为了验证MDL框架的有效性和优势，在两个不同的多模式RS数据集上进行了与MML和CML设置有关的广泛实验。此外，这些代码和数据集将在https://github.com/danfenghong/ieee_tgrs_mdl-rs上提供，为RS社区做出了贡献。

Classification and identification of the materials lying over or beneath the Earth's surface have long been a fundamental but challenging research topic in geoscience and remote sensing (RS) and have garnered a growing concern owing to the recent advancements of deep learning techniques. Although deep networks have been successfully applied in single-modality-dominated classification tasks, yet their performance inevitably meets the bottleneck in complex scenes that need to be finely classified, due to the limitation of information diversity. In this work, we provide a baseline solution to the aforementioned difficulty by developing a general multimodal deep learning (MDL) framework. In particular, we also investigate a special case of multi-modality learning (MML) -- cross-modality learning (CML) that exists widely in RS image classification applications. By focusing on "what", "where", and "how" to fuse, we show different fusion strategies as well as how to train deep networks and build the network architecture. Specifically, five fusion architectures are introduced and developed, further being unified in our MDL framework. More significantly, our framework is not only limited to pixel-wise classification tasks but also applicable to spatial information modeling with convolutional neural networks (CNNs). To validate the effectiveness and superiority of the MDL framework, extensive experiments related to the settings of MML and CML are conducted on two different multimodal RS datasets. Furthermore, the codes and datasets will be available at https://github.com/danfenghong/IEEE_TGRS_MDL-RS, contributing to the RS community.

下载PDF全文

下载文献需遵守相关版权规定

论文标题