论文标题
蒙面的拼图拼图:视觉变压器的多功能位置
Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers
论文作者
论文摘要
位置嵌入(PES)是视觉变压器(VIT)中必不可少的组件(VITS),已被证明可以改善VIT在许多视觉任务上的性能。但是,由于输入贴片的空间信息暴露了,PE的隐私泄漏风险可能很高。这种警告自然提出了一系列有趣的问题,即PES对解决这些问题的准确性,隐私,预测一致性等的影响,我们提出了一种蒙面的拼图拼图(MJP)位置嵌入方法。特别是,MJP首先通过我们的块随机拼图拼图混合算法和它们相应的PES被遮挡。同时,对于非封闭斑块,PES仍然是原始的,但是它们的空间关系通过我们密集的绝对定位回归增强。实验结果表明,1)PES明确编码2D空间关系并导致梯度反转攻击下的严重隐私泄漏问题; 2)用天真的洗牌补丁进行训练可以减轻问题,但会损害准确性; 3)在一定的混乱比率下,提出的MJP不仅可以提高大规模数据集(即Imagenet-1K和Imagenet-C,-A/O)的性能和鲁棒性,而且还提高了典型梯度攻击在典型的梯度攻击下的隐私能力。源代码和训练有素的模型可在〜\ url {https://github.com/yhlleo/mjp}上获得。
Position Embeddings (PEs), an arguably indispensable component in Vision Transformers (ViTs), have been shown to improve the performance of ViTs on many vision tasks. However, PEs have a potentially high risk of privacy leakage since the spatial information of the input patches is exposed. This caveat naturally raises a series of interesting questions about the impact of PEs on the accuracy, privacy, prediction consistency, etc. To tackle these issues, we propose a Masked Jigsaw Puzzle (MJP) position embedding method. In particular, MJP first shuffles the selected patches via our block-wise random jigsaw puzzle shuffle algorithm, and their corresponding PEs are occluded. Meanwhile, for the non-occluded patches, the PEs remain the original ones but their spatial relation is strengthened via our dense absolute localization regressor. The experimental results reveal that 1) PEs explicitly encode the 2D spatial relationship and lead to severe privacy leakage problems under gradient inversion attack; 2) Training ViTs with the naively shuffled patches can alleviate the problem, but it harms the accuracy; 3) Under a certain shuffle ratio, the proposed MJP not only boosts the performance and robustness on large-scale datasets (i.e., ImageNet-1K and ImageNet-C, -A/O) but also improves the privacy preservation ability under typical gradient attacks by a large margin. The source code and trained models are available at~\url{https://github.com/yhlleo/MJP}.