双流动变压器的联合旋转不变性和对抗训练产生面积V4的最先进的脑记录状态

论文标题

双流动变压器的联合旋转不变性和对抗训练产生面积V4的最先进的脑记录状态

Joint rotational invariance and adversarial training of a dual-stream Transformer yields state of the art Brain-Score for Area V4

论文作者

Berrios, William, Deza, Arturo

论文摘要

大脑评分竞争中现代的高分视力模型并不源于视觉变压器。但是，在本文中，我们通过显示双流动变压器，交叉维特$〜\ textit {a la} $ chen等人如何与人类视觉表示的意外趋势（VIT）提供了证据。（2021年），在旋转不变和对抗性优化过程中，在所有视觉类别中平均竞争（Schrimpf等人，2020b）在总体大脑评分2022竞赛中获得第二名，并且在竞争时均获得了第1位，以获得最高的可解释区域V4的可解释差异。此外，与以生物学启发的CNN（RESNET50）相比，我们当前基于变压器的模型还可以实现更大的IT和行为的可解释差异，该CNN（RESNET50）整合了额叶V1样计算模块（Dapello等，2020）。为了评估优化方案对交叉架构的贡献，我们对对抗性鲁棒性，共同腐败基准，中腹刺激解释和特征反演进行了几个其他实验。按照我们的最初期望，我们的结果家庭为$ \ textit {“所有道路通向罗马”} $参数也通过联合优化规则强制执行，即使是针对视觉变形金刚等非生物动机模型（例如无生物动机模型）。代码可从https://github.com/williamberrios/brainscore-transformers获得

Modern high-scoring models of vision in the brain score competition do not stem from Vision Transformers. However, in this paper, we provide evidence against the unexpected trend of Vision Transformers (ViT) being not perceptually aligned with human visual representations by showing how a dual-stream Transformer, a CrossViT$~\textit{a la}$ Chen et al. (2021), under a joint rotationally-invariant and adversarial optimization procedure yields 2nd place in the aggregate Brain-Score 2022 competition(Schrimpf et al., 2020b) averaged across all visual categories, and at the time of the competition held 1st place for the highest explainable variance of area V4. In addition, our current Transformer-based model also achieves greater explainable variance for areas V4, IT and Behaviour than a biologically-inspired CNN (ResNet50) that integrates a frontal V1-like computation module (Dapello et al.,2020). To assess the contribution of the optimization scheme with respect to the CrossViT architecture, we perform several additional experiments on differently optimized CrossViT's regarding adversarial robustness, common corruption benchmarks, mid-ventral stimuli interpretation and feature inversion. Against our initial expectations, our family of results provides tentative support for an $\textit{"All roads lead to Rome"}$ argument enforced via a joint optimization rule even for non biologically-motivated models of vision such as Vision Transformers. Code is available at https://github.com/williamberrios/BrainScore-Transformers

下载PDF全文

下载文献需遵守相关版权规定

论文标题