论文标题
分数LP威胁模型的可证明的对抗性鲁棒性
Provable Adversarial Robustness for Fractional Lp Threat Models
论文作者
论文摘要
近年来,研究人员在包括L_0,L_1,L_2和L_Infinity-Norm边界对抗性攻击等各种威胁模型中广泛研究了对抗性鲁棒性。但是,由分数L_P“规范”界定的攻击(由L_P距离定义为0 <p <1)尚待彻底考虑。我们主动提出具有多种理想特性的防御:它提供可证明的(经认证的)鲁棒性,对成像网的比例,并且在应用于量化数据(例如图像)时,可以确定性(而不是高概率)认证的保证。对于任何0 <p <1,我们对分数L_P鲁棒性构建的技术是表达的,深层的分类器,这些分类器是全球lipchitz的。但是,我们的方法更加笼统:我们可以构建与定义为组件凹函数的任何度量的全球Lipchitz的分类器。我们的方法是基于最近的作品,莱文和费齐(2021),该作品为L_1攻击提供了可证明的辩护。但是,我们证明,与直接使用(Levine and Feizi,2021)的微不足道解决方案相比,我们提出的保证是高度无效的,并采用了规范不平等。代码可从https://github.com/alevine0/fractionallprobustness获得。
In recent years, researchers have extensively studied adversarial robustness in a variety of threat models, including L_0, L_1, L_2, and L_infinity-norm bounded adversarial attacks. However, attacks bounded by fractional L_p "norms" (quasi-norms defined by the L_p distance with 0<p<1) have yet to be thoroughly considered. We proactively propose a defense with several desirable properties: it provides provable (certified) robustness, scales to ImageNet, and yields deterministic (rather than high-probability) certified guarantees when applied to quantized data (e.g., images). Our technique for fractional L_p robustness constructs expressive, deep classifiers that are globally Lipschitz with respect to the L_p^p metric, for any 0<p<1. However, our method is even more general: we can construct classifiers which are globally Lipschitz with respect to any metric defined as the sum of concave functions of components. Our approach builds on a recent work, Levine and Feizi (2021), which provides a provable defense against L_1 attacks. However, we demonstrate that our proposed guarantees are highly non-vacuous, compared to the trivial solution of using (Levine and Feizi, 2021) directly and applying norm inequalities. Code is available at https://github.com/alevine0/fractionalLpRobustness.