论文标题
3D Shuffle-Mixer:有效的上下文感知的变压器MLP范式的视觉学习者,用于密集的医疗预测
3D Shuffle-Mixer: An Efficient Context-Aware Vision Learner of Transformer-MLP Paradigm for Dense Prediction in Medical Volume
论文作者
论文摘要
医疗量的密集预测为临床分析提供了丰富的指导。由于缺乏远距离依赖性和全球上下文建模能力,CNN骨架已经达到了瓶颈。由于其强大的全球捕获能力和学习能力,最近的作品提议将视觉变压器与CNN相结合。但是,大多数作品仅限于仅使用几个致命缺陷的纯变压器(即缺乏电感偏见,重量计算,而对3D数据几乎没有考虑)。因此,为医学量的密集预测设计优雅有效的视觉变压者学习者是有希望的和具有挑战性的。在本文中,我们提出了一个新型的3D Shuffle-Mixer网络,该网络是一个新的本地视觉变压器MLP范式,用于医学密集的预测。在我们的网络中,当地视觉变压器块可从重新排列的体积的全视图切片中进行洗牌和学习空间上下文,剩余的轴向MLP旨在以切片感知的方式混合和捕获剩余的体积上下文,MLP视图聚合器可用于将知识的丰富上下文投入到众多的综合上下文中,以访问视图的方式。此外,为局部视觉变压器提出了自适应缩放的增强快捷方式,以适应沿空间和通道尺寸增强特征,并提出了一个十字连接以跳过金字塔结构中适当的多尺度特征。广泛的实验表明,所提出的模型优于其他最先进的医学密集预测方法。
Dense prediction in medical volume provides enriched guidance for clinical analysis. CNN backbones have met bottleneck due to lack of long-range dependencies and global context modeling power. Recent works proposed to combine vision transformer with CNN, due to its strong global capture ability and learning capability. However, most works are limited to simply applying pure transformer with several fatal flaws (i.e., lack of inductive bias, heavy computation and little consideration for 3D data). Therefore, designing an elegant and efficient vision transformer learner for dense prediction in medical volume is promising and challenging. In this paper, we propose a novel 3D Shuffle-Mixer network of a new Local Vision Transformer-MLP paradigm for medical dense prediction. In our network, a local vision transformer block is utilized to shuffle and learn spatial context from full-view slices of rearranged volume, a residual axial-MLP is designed to mix and capture remaining volume context in a slice-aware manner, and a MLP view aggregator is employed to project the learned full-view rich context to the volume feature in a view-aware manner. Moreover, an Adaptive Scaled Enhanced Shortcut is proposed for local vision transformer to enhance feature along spatial and channel dimensions adaptively, and a CrossMerge is proposed to skip-connects the multi-scale feature appropriately in the pyramid architecture. Extensive experiments demonstrate the proposed model outperforms other state-of-the-art medical dense prediction methods.