论文标题
GSWIN:带有移位窗口的层次结构的封闭式MLP视觉模型
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window
论文作者
论文摘要
在语言领域取得成功之后,自我发挥机制(变压器)在视觉领域采用并取得了巨大的成功。此外,作为另一个流中的多层感知器(MLP),在视觉域中也探索了。除传统CNN以外,这些架构最近引起了人们的关注,并提出了许多方法。作为将参数效率和性能与图像识别中的局部性和层次结合在一起的一种,我们提出了将两个流融合的GSWIN。 Swin Transformer和(多头)GMLP。我们表明,与Swin Transformer相比,GSWIN可以在三个视觉任务,图像分类,对象检测和语义分割方面实现更好的准确性,其模型尺寸较小。
Following the success in language domain, the self-attention mechanism (transformer) is adopted in the vision domain and achieving great success recently. Additionally, as another stream, multi-layer perceptron (MLP) is also explored in the vision domain. These architectures, other than traditional CNNs, have been attracting attention recently, and many methods have been proposed. As one that combines parameter efficiency and performance with locality and hierarchy in image recognition, we propose gSwin, which merges the two streams; Swin Transformer and (multi-head) gMLP. We showed that our gSwin can achieve better accuracy on three vision tasks, image classification, object detection and semantic segmentation, than Swin Transformer, with smaller model size.