RGB-T显着对象检测的交互式上下文感知网络

论文标题

RGB-T显着对象检测的交互式上下文感知网络

Interactive Context-Aware Network for RGB-T Salient Object Detection

论文作者

Wang, Yuxuan, Dong, Feng, Zhu, Jinchao

论文摘要

显着对象检测（SOD）重点是区分场景中最明显的对象。但是，大多数相关作品基于RGB图像，这些图像失去了大量有用信息。因此，随着热技术的成熟度，RGB-T（RGB-thermal）多模式任务越来越关注。热红外图像具有重要信息，可用于提高SOD预测的准确性。为了实现这一目标，集成多模式信息和抑制声音的方法至关重要。在本文中，我们提出了一个名为“交互式上下文感知网络”（ICANET）的新颖网络。它包含三个可以有效执行跨模式和跨尺度融合的模块。我们设计了混合特征融合（HFF）模块，以整合两种模式的特征，该功能利用了两种特征提取。多尺度注意力增强（MSAR）和上融合（UF）块负责跨尺度融合，该融合会收敛不同级别的特征并生成预测图。我们还提出了一种新颖的环境感知的多计划网络（CAMSNET），以计算预测和地面真相（GT）之间的内容损失。实验证明，我们的网络对最新的RGB-T SOD方法表现出色。

Salient object detection (SOD) focuses on distinguishing the most conspicuous objects in the scene. However, most related works are based on RGB images, which lose massive useful information. Accordingly, with the maturity of thermal technology, RGB-T (RGB-Thermal) multi-modality tasks attain more and more attention. Thermal infrared images carry important information which can be used to improve the accuracy of SOD prediction. To accomplish it, the methods to integrate multi-modal information and suppress noises are critical. In this paper, we propose a novel network called Interactive Context-Aware Network (ICANet). It contains three modules that can effectively perform the cross-modal and cross-scale fusions. We design a Hybrid Feature Fusion (HFF) module to integrate the features of two modalities, which utilizes two types of feature extraction. The Multi-Scale Attention Reinforcement (MSAR) and Upper Fusion (UF) blocks are responsible for the cross-scale fusion that converges different levels of features and generate the prediction maps. We also raise a novel Context-Aware Multi-Supervised Network (CAMSNet) to calculate the content loss between the prediction and the ground truth (GT). Experiments prove that our network performs favorably against the state-of-the-art RGB-T SOD methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题