论文标题
对抗策略的空间
The Space of Adversarial Strategies
论文作者
论文摘要
在过去的十年中,已经对对抗性的例子,旨在诱导机器学习模型中最坏情况行为的输入进行了广泛的研究。然而,我们对这种现象的理解源于相当零散的知识库。目前,有少数攻击,每个攻击在威胁模型中都有不同的假设,以及对最优性的无与伦比的定义。在本文中,我们提出了一种系统的方法来表征最坏情况(即最佳)对手。我们首先通过将攻击组件雾化到表面和旅行者中,引入对抗机器学习中攻击的扩展分解。通过分解,我们列举组件以创建576次攻击(以前没有探索568次攻击)。接下来,我们提出了帕累托合奏攻击(PEA):上限攻击性能的理论攻击。借助我们的新攻击,我们衡量相对于PEA的性能:鲁棒和非运动模型,七个数据集以及三个基于LP的扩展威胁模型,其中包含计算成本,从而正式化了对抗性策略的空间。从我们的评估中,我们发现攻击性能是高度背景的:领域,稳健性和威胁模型可以对攻击效率产生深远的影响。我们的调查表明,未来衡量机器学习安全性的研究应:(1)与域和威胁模型背景相关,(2)超越了当今使用的少数已知攻击。
Adversarial examples, inputs designed to induce worst-case behavior in machine learning models, have been extensively studied over the past decade. Yet, our understanding of this phenomenon stems from a rather fragmented pool of knowledge; at present, there are a handful of attacks, each with disparate assumptions in threat models and incomparable definitions of optimality. In this paper, we propose a systematic approach to characterize worst-case (i.e., optimal) adversaries. We first introduce an extensible decomposition of attacks in adversarial machine learning by atomizing attack components into surfaces and travelers. With our decomposition, we enumerate over components to create 576 attacks (568 of which were previously unexplored). Next, we propose the Pareto Ensemble Attack (PEA): a theoretical attack that upper-bounds attack performance. With our new attacks, we measure performance relative to the PEA on: both robust and non-robust models, seven datasets, and three extended lp-based threat models incorporating compute costs, formalizing the Space of Adversarial Strategies. From our evaluation we find that attack performance to be highly contextual: the domain, model robustness, and threat model can have a profound influence on attack efficacy. Our investigation suggests that future studies measuring the security of machine learning should: (1) be contextualized to the domain & threat models, and (2) go beyond the handful of known attacks used today.