损失景观和对抗性鲁棒性中的桥接模式连通性

论文标题

损失景观和对抗性鲁棒性中的桥接模式连通性

Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness

论文作者

Zhao, Pu, Chen, Pin-Yu, Das, Payel, Ramamurthy, Karthikeyan Natesan, Lin, Xue

论文摘要

模式连通性提供了有关分析损失景观的新几何见解，并使训练有素的神经网络之间建立高准确的途径。在这项工作中，我们建议在损失景观中采用模式连接性来研究深神经网络的对抗性鲁棒性，并提供改善这种鲁棒性的新方法。我们的实验涵盖了应用于不同网络体系结构和数据集的各种类型的对抗攻击。当网络模型被后门或错误注射攻击篡改时，我们的结果表明，使用有限的善意数据学到的路径连接可以有效地减轻对抗性效果，同时保持清洁数据的原始精度。因此，模式连接为用户提供了维修背面或错误的模型的功能。我们还使用模式连接性来研究规则和健壮模型的损失景观，以防止逃避攻击。实验表明，在连接常规和对抗训练的模型的路径上存在对抗鲁棒性损失的障碍。在对抗性鲁棒性损失与输入Hessian矩阵的最大特征值之间观察到高度相关性，为此提供了理论上的理由。我们的结果表明，模式连接为评估和改善对抗性鲁棒性提供了整体工具和实用手段。

Mode connectivity provides novel geometric insights on analyzing loss landscapes and enables building high-accuracy pathways between well-trained neural networks. In this work, we propose to employ mode connectivity in loss landscapes to study the adversarial robustness of deep neural networks, and provide novel methods for improving this robustness. Our experiments cover various types of adversarial attacks applied to different network architectures and datasets. When network models are tampered with backdoor or error-injection attacks, our results demonstrate that the path connection learned using limited amount of bonafide data can effectively mitigate adversarial effects while maintaining the original accuracy on clean data. Therefore, mode connectivity provides users with the power to repair backdoored or error-injected models. We also use mode connectivity to investigate the loss landscapes of regular and robust models against evasion attacks. Experiments show that there exists a barrier in adversarial robustness loss on the path connecting regular and adversarially-trained models. A high correlation is observed between the adversarial robustness loss and the largest eigenvalue of the input Hessian matrix, for which theoretical justifications are provided. Our results suggest that mode connectivity offers a holistic tool and practical means for evaluating and improving adversarial robustness.

下载PDF全文

下载文献需遵守相关版权规定

论文标题