论文标题
保护分类器免受攻击
Protecting Classifiers From Attacks
论文作者
论文摘要
在多个域,例如恶意软件检测,自动驾驶系统或欺诈检测中,分类算法容易受到愿意扰乱实例协变量价值以追求某些目标的恶意代理人攻击的。此类问题与对抗机器学习领域有关,并且主要通过具有强大的基本常识假设的游戏理论思想来理解。这些在与安全和业务竞争有关的许多应用领域中都不现实。我们提出了另一种贝叶斯决策理论框架,该框架使用对抗性风险分析概念来解释攻击者行为的不确定性。通过这样做,我们还向统计受众介绍了对抗机器学习中的核心思想。我们框架中的一个关键成分是能够从原始实例的分布中采样,鉴于可能攻击,观察到的实例。我们提出了一个基于操作过程中近似贝叶斯计算的初始程序;在其中,我们考虑了我们对他的元素的不确定性,模拟了攻击者的问题。大规模的问题需要在训练阶段实施的替代可扩展方法。在全球范围内,我们能够对恶意攻击进行稳健的统计分类算法。
In multiple domains such as malware detection, automated driving systems, or fraud detection, classification algorithms are susceptible to being attacked by malicious agents willing to perturb the value of instance covariates to pursue certain goals. Such problems pertain to the field of adversarial machine learning and have been mainly dealt with, perhaps implicitly, through game-theoretic ideas with strong underlying common knowledge assumptions. These are not realistic in numerous application domains in relation to security and business competition. We present an alternative Bayesian decision theoretic framework that accounts for the uncertainty about the attacker's behavior using adversarial risk analysis concepts. In doing so, we also present core ideas in adversarial machine learning to a statistical audience. A key ingredient in our framework is the ability to sample from the distribution of originating instances given the, possibly attacked, observed ones. We propose an initial procedure based on approximate Bayesian computation usable during operations; within it, we simulate the attacker's problem taking into account our uncertainty about his elements. Large-scale problems require an alternative scalable approach implementable during the training stage. Globally, we are able to robustify statistical classification algorithms against malicious attacks.