论文标题

视觉变形金刚与虚假相关性强大吗?

Are Vision Transformers Robust to Spurious Correlations?

论文作者

Ghosal, Soumya Suvra, Ming, Yifei, Li, Yixuan

论文摘要

深层神经网络可能容易受到平均而不具有非典型测试样本的伪造相关性。与最近的视觉变压器(VIT)模型的出现一样,这种体系结构中如何表现出虚假的相关性。在本文中,我们系统地研究了视觉变压器在三个挑战性基准数据集上具有虚假相关性的鲁棒性,并将其性能与流行的CNN进行比较。我们的研究表明,当在足够大的数据集中进行预训练时,VIT模型比CNN对虚假相关性更强。他们成功的关键是从虚假相关性不存在的示例中更好地概括的能力。此外,我们进行了广泛的消融和实验,以了解自我发项机制在虚假相关环境下提供鲁棒性的作用。我们希望我们的工作将激发未来的研究,进一步了解VIT模型的鲁棒性。

Deep neural networks may be susceptible to learning spurious correlations that hold on average but not in atypical test samples. As with the recent emergence of vision transformer (ViT) models, it remains underexplored how spurious correlations are manifested in such architectures. In this paper, we systematically investigate the robustness of vision transformers to spurious correlations on three challenging benchmark datasets and compare their performance with popular CNNs. Our study reveals that when pre-trained on a sufficiently large dataset, ViT models are more robust to spurious correlations than CNNs. Key to their success is the ability to generalize better from the examples where spurious correlations do not hold. Further, we perform extensive ablations and experiments to understand the role of the self-attention mechanism in providing robustness under spuriously correlated environments. We hope that our work will inspire future research on further understanding the robustness of ViT models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源