论文标题
关于对抗训练的概括特性
On the Generalization Properties of Adversarial Training
论文作者
论文摘要
当测试数据略微干扰时,现代的机器学习和深度学习模型被证明是脆弱的。对抗性训练算法的现有理论研究主要集中在对抗训练损失或局部收敛属性上。相比之下,本文研究了通用对抗训练算法的概括性能。具体而言,我们考虑使用平方损失在低维和高维度下使用平方损失的线性回归模型和两层神经网络(进行懒惰训练)。在以前的政权中,在克服了对抗性训练的不平滑度之后,受过训练的模型的对抗风险可以汇聚到最小的对抗风险。在后一种制度中,我们发现数据插值阻止了对抗性稳健的估计器的一致性。因此,受到最不绝对的收缩和选择操作员(LASSO)的成功的启发,我们将L1惩罚纳入了高维对对手学习中,并表明它会导致一致的对手稳健估计。进行了一系列数值研究,以证明平滑度和L1惩罚如何有助于改善DNN模型的对抗性鲁棒性。
Modern machine learning and deep learning models are shown to be vulnerable when testing data are slightly perturbed. Existing theoretical studies of adversarial training algorithms mostly focus on either adversarial training losses or local convergence properties. In contrast, this paper studies the generalization performance of a generic adversarial training algorithm. Specifically, we consider linear regression models and two-layer neural networks (with lazy training) using squared loss under low-dimensional and high-dimensional regimes. In the former regime, after overcoming the non-smoothness of adversarial training, the adversarial risk of the trained models can converge to the minimal adversarial risk. In the latter regime, we discover that data interpolation prevents the adversarially robust estimator from being consistent. Therefore, inspired by successes of the least absolute shrinkage and selection operator (LASSO), we incorporate the L1 penalty in the high dimensional adversarial learning and show that it leads to consistent adversarially robust estimation. A series of numerical studies are conducted to demonstrate how the smoothness and L1 penalization help improve the adversarial robustness of DNN models.