论文标题
人群计数的多尺度功能聚合
Multi-scale Feature Aggregation for Crowd Counting
论文作者
论文摘要
在过去的几年中,基于卷积的神经网络(CNN)的人群计数方法取得了令人鼓舞的结果。但是,对于准确的计数估计,规模变化问题仍然是一个巨大的挑战。在本文中,我们提出了一个多尺度特征聚合网络(MSFANET),可以在某种程度上减轻此问题。具体而言,我们的方法由两个特征聚合模块组成:短聚合(Shortagg)和Skip Contregation(Skipagg)。 Shortagg模块聚集了相邻卷积块的特征。其目的是制作具有从底部到网络顶部逐渐融合的不同接收场的功能。 Skipagg模块将具有小型接收场的特征直接传播到具有更大接收场的特征。它的目的是促进特征与大小接收场的融合。特别是,Skipagg模块引入了Swin Transformer块中的本地自我注意事项,以包含丰富的空间信息。此外,我们通过考虑不均匀的人群分布来提出基于局部和全球的计数损失。在四个具有挑战性的数据集(Shanghaitech数据集,UCF_CC_50数据集,UCF-QNRF数据集,WorldExpo'10 DataSet)上进行了广泛的实验,这表明与先前先前的先进方法相比,提出的易于实现的MSFANET可以实现有希望的结果。
Convolutional Neural Network (CNN) based crowd counting methods have achieved promising results in the past few years. However, the scale variation problem is still a huge challenge for accurate count estimation. In this paper, we propose a multi-scale feature aggregation network (MSFANet) that can alleviate this problem to some extent. Specifically, our approach consists of two feature aggregation modules: the short aggregation (ShortAgg) and the skip aggregation (SkipAgg). The ShortAgg module aggregates the features of the adjacent convolution blocks. Its purpose is to make features with different receptive fields fused gradually from the bottom to the top of the network. The SkipAgg module directly propagates features with small receptive fields to features with much larger receptive fields. Its purpose is to promote the fusion of features with small and large receptive fields. Especially, the SkipAgg module introduces the local self-attention features from the Swin Transformer blocks to incorporate rich spatial information. Furthermore, we present a local-and-global based counting loss by considering the non-uniform crowd distribution. Extensive experiments on four challenging datasets (ShanghaiTech dataset, UCF_CC_50 dataset, UCF-QNRF Dataset, WorldExpo'10 dataset) demonstrate the proposed easy-to-implement MSFANet can achieve promising results when compared with the previous state-of-the-art approaches.