MAFNET：用于RGB-T人群计数的多发融合网络

论文标题

MAFNET：用于RGB-T人群计数的多发融合网络

MAFNet: A Multi-Attention Fusion Network for RGB-T Crowd Counting

论文作者

Chen, Pengyu, Gao, Junyu, Yuan, Yuan, Wang, Qi

论文摘要

RGB-thermal（RGB-T）人群计数是一项具有挑战性的任务，它将热图像用作与RGB图像的互补信息，以应对低弹片或类似背景的场景中基于单型RGB的方法的降低。大多数现有方法提出了精心设计的结构，用于RGB-T人群计数中的跨模式融合。但是，这些方法在编码RGB-T图像对中编码跨模式上下文语义信息方面存在困难。考虑到上述问题，我们提出了一个称为多发意见融合网络（MAFNET）的两流RGB-T人群计数网络，该网络旨在根据注意机制完全捕获RGB和热模式中的远距离上下文信息。具体而言，在编码器部分中，多发融合（MAF）模块嵌入到全球级别的两个特定于模态分支的不同阶段中。此外，还引入了多模式的多尺度聚合（MMA）回归头，以充分利用跨模态的多尺度和上下文信息，以生成高质量的人群密度图。在两个受欢迎的数据集上进行的广泛实验表明，拟议的MAFNET对RGB-T人群计数有效，并实现了最新的性能。

RGB-Thermal (RGB-T) crowd counting is a challenging task, which uses thermal images as complementary information to RGB images to deal with the decreased performance of unimodal RGB-based methods in scenes with low-illumination or similar backgrounds. Most existing methods propose well-designed structures for cross-modal fusion in RGB-T crowd counting. However, these methods have difficulty in encoding cross-modal contextual semantic information in RGB-T image pairs. Considering the aforementioned problem, we propose a two-stream RGB-T crowd counting network called Multi-Attention Fusion Network (MAFNet), which aims to fully capture long-range contextual information from the RGB and thermal modalities based on the attention mechanism. Specifically, in the encoder part, a Multi-Attention Fusion (MAF) module is embedded into different stages of the two modality-specific branches for cross-modal fusion at the global level. In addition, a Multi-modal Multi-scale Aggregation (MMA) regression head is introduced to make full use of the multi-scale and contextual information across modalities to generate high-quality crowd density maps. Extensive experiments on two popular datasets show that the proposed MAFNet is effective for RGB-T crowd counting and achieves the state-of-the-art performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题