论文标题
CrowdMLP:通过多粒性MLP计数弱监督的人群
CrowdMLP: Weakly-Supervised Crowd Counting via Multi-Granularity MLP
论文作者
论文摘要
现有的最先进的人群计算算法过于依赖于位置级别的注释,这些注释是繁重的。当仅可用计数级(弱)监督信号时,由于缺乏明确的空间约束,它会艰巨而容易出错。为了解决这个问题,提出了一个新颖有效的计数器(称为CrowdMLP),该计数器通过设计多粒性MLP回归剂来探测对嵌入的全球依赖性进行建模和回归总数的建模。在特定的角度上,将局部注重的预训练前端级联提取带有内在空间提示的粗略特征图,这阻止了模型崩溃成琐碎的结果。粗糙的嵌入以及原始的人群场景在不同的粒度水平上被象征化。然后,多粒性MLP继续在基础,通道和空间的尺寸上混合代币,以挖掘全球信息。还提出了一个有效的代理任务,即拆分计数,以避免有限样本的障碍和空间提示的短缺。广泛的实验表明,CrowdMLP的表现明显优于现有的弱监督算法算法,并与最先进的位置级别监督方法相同。
Existing state-of-the-art crowd counting algorithms rely excessively on location-level annotations, which are burdensome to acquire. When only count-level (weak) supervisory signals are available, it is arduous and error-prone to regress total counts due to the lack of explicit spatial constraints. To address this issue, a novel and efficient counter (referred to as CrowdMLP) is presented, which probes into modelling global dependencies of embeddings and regressing total counts by devising a multi-granularity MLP regressor. In specific, a locally-focused pre-trained frontend is cascaded to extract crude feature maps with intrinsic spatial cues, which prevent the model from collapsing into trivial outcomes. The crude embeddings, along with raw crowd scenes, are tokenized at different granularity levels. The multi-granularity MLP then proceeds to mix tokens at the dimensions of cardinality, channel, and spatial for mining global information. An effective proxy task, namely Split-Counting, is also proposed to evade the barrier of limited samples and the shortage of spatial hints in a self-supervised manner. Extensive experiments demonstrate that CrowdMLP significantly outperforms existing weakly-supervised counting algorithms and performs on par with state-of-the-art location-level supervised approaches.