论文标题
混合和转移:在视觉MLP中利用全球和本地依赖性
Mixing and Shifting: Exploiting Global and Local Dependencies in Vision MLPs
论文作者
论文摘要
令牌混合多层感知器(MLP)模型在具有简单的体系结构和相对较小的计算成本的计算机视觉任务中显示了竞争性能。它们在维持计算效率方面的成功主要归因于避免使用通常在计算上很重的自我注意力,但这是无法在全球范围内和本地混合代币的代价。在本文中,为了在没有自我注意力的情况下利用全球和局部依赖性,我们提出了混合速度-MLP(MS-MLP),这使得与空间变化的量相对于用于混合的局部接受场的大小。除了传统的混合和转移技术外,MS-MLP还将相邻和远处的令牌从细粒度混合到粗粒水平,然后通过变化的操作将它们聚集。这直接有助于全球和本地令牌之间的相互作用。 MS-MLP易于实施,可以在多种视觉基准中实现竞争性能。例如,具有8500万参数的MS-MLP在Imagenet-1K上达到了83.8%的TOP-1分类精度。此外,通过将MS-MLP与SWIN Transformer等最新视觉变压器(例如Swin Transformer)相结合,我们显示MS-MLP在三种不同的模型尺度上实现了进一步的改进,例如,使用SWIN-B进行ImagEnet-1K分类中的0.5%。该代码可在以下网址提供:https://github.com/jegzheng/ms-mlp。
Token-mixing multi-layer perceptron (MLP) models have shown competitive performance in computer vision tasks with a simple architecture and relatively small computational cost. Their success in maintaining computation efficiency is mainly attributed to avoiding the use of self-attention that is often computationally heavy, yet this is at the expense of not being able to mix tokens both globally and locally. In this paper, to exploit both global and local dependencies without self-attention, we present Mix-Shift-MLP (MS-MLP) which makes the size of the local receptive field used for mixing increase with respect to the amount of spatial shifting. In addition to conventional mixing and shifting techniques, MS-MLP mixes both neighboring and distant tokens from fine- to coarse-grained levels and then gathers them via a shifting operation. This directly contributes to the interactions between global and local tokens. Being simple to implement, MS-MLP achieves competitive performance in multiple vision benchmarks. For example, an MS-MLP with 85 million parameters achieves 83.8% top-1 classification accuracy on ImageNet-1K. Moreover, by combining MS-MLP with state-of-the-art Vision Transformers such as the Swin Transformer, we show MS-MLP achieves further improvements on three different model scales, e.g., by 0.5% on ImageNet-1K classification with Swin-B. The code is available at: https://github.com/JegZheng/MS-MLP.