论文标题

用于视频的人行检索的变压器中的多方向和多尺度金字塔

Multi-direction and Multi-scale Pyramid in Transformer for Video-based Pedestrian Retrieval

论文作者

Zang, Xianghao, Li, Ge, Gao, Wei

论文摘要

在视频监视中,行人检索(也称为人员重新识别)是一项关键任务。该任务旨在从非重叠摄像机中检索感兴趣的行人。最近,基于变压器的模型已为这项任务取得了重大进展。但是,这些模型仍然忽略了细粒度​​的部分信息。本文提出了变压器(PIT)中的多个方向和多尺度金字塔,以解决此问题。在基于变压器的体系结构中,每个行人图像都分为许多补丁。然后,将这些贴片馈送到变压器层以获得此图像的特征表示。为了探索细粒度的信息,本文提议在这些斑块上应用垂直分裂和水平划分,以生成不同方向的人类部位。这些部分提供了更多细粒度的信息。为了融合多尺度特征表示,本文介绍了包含来自不同尺度的全局信息和许多本地级信息的金字塔结构。来自同一视频的所有行人图像的特征金字塔都融合在一起,以形成最终的多方向和多尺度特征表示。对两个具有挑战性的基于视频的基准MARS和ILIDS-VID的实验结果表明,所提出的矿坑达到了最新的性能。广泛的消融研究证明了所提出的金字塔结构的优越性。该代码可在https://git.openi.org.cn/zangxh/pit.git上找到。

In video surveillance, pedestrian retrieval (also called person re-identification) is a critical task. This task aims to retrieve the pedestrian of interest from non-overlapping cameras. Recently, transformer-based models have achieved significant progress for this task. However, these models still suffer from ignoring fine-grained, part-informed information. This paper proposes a multi-direction and multi-scale Pyramid in Transformer (PiT) to solve this problem. In transformer-based architecture, each pedestrian image is split into many patches. Then, these patches are fed to transformer layers to obtain the feature representation of this image. To explore the fine-grained information, this paper proposes to apply vertical division and horizontal division on these patches to generate different-direction human parts. These parts provide more fine-grained information. To fuse multi-scale feature representation, this paper presents a pyramid structure containing global-level information and many pieces of local-level information from different scales. The feature pyramids of all the pedestrian images from the same video are fused to form the final multi-direction and multi-scale feature representation. Experimental results on two challenging video-based benchmarks, MARS and iLIDS-VID, show the proposed PiT achieves state-of-the-art performance. Extensive ablation studies demonstrate the superiority of the proposed pyramid structure. The code is available at https://git.openi.org.cn/zangxh/PiT.git.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源