域知识减轻了多标签分类器中的对抗性攻击

论文标题

域知识减轻了多标签分类器中的对抗性攻击

Domain Knowledge Alleviates Adversarial Attacks in Multi-Label Classifiers

论文作者

Melacci, Stefano, Ciravegna, Gabriele, Sotgiu, Angelo, Demontis, Ambra, Biggio, Battista, Gori, Marco, Roli, Fabio

论文摘要

在单标签分类问题的背景下，对基于机器学习的分类器的对抗性攻击以及防御机制进行了广泛的研究。在本文中，我们将注意力转移到了多标签分类中，在该分类中，有关被考虑类别之间关系的域知识的可用性可能会提供一种自然的方式来发现不一致的预测，即与培训数据分布之外的对抗性示例相关的预测。我们在一个框架中探讨了这种直觉，在一个框架中，一阶逻辑知识被转换为约束并注入半监督的学习问题。在这种情况下，受约束的分类器学会了在边际分布上满足域知识，并且自然可以用不连贯的预测拒绝样品。即使我们的方法在训练过程中没有利用任何攻击知识，但我们的实验分析令人惊讶地揭示了域知识约束可以帮助有效地检测到对抗性示例，尤其是在攻击者不知道此类约束的情况下。

Adversarial attacks on machine learning-based classifiers, along with defense mechanisms, have been widely studied in the context of single-label classification problems. In this paper, we shift the attention to multi-label classification, where the availability of domain knowledge on the relationships among the considered classes may offer a natural way to spot incoherent predictions, i.e., predictions associated to adversarial examples lying outside of the training data distribution. We explore this intuition in a framework in which first-order logic knowledge is converted into constraints and injected into a semi-supervised learning problem. Within this setting, the constrained classifier learns to fulfill the domain knowledge over the marginal distribution, and can naturally reject samples with incoherent predictions. Even though our method does not exploit any knowledge of attacks during training, our experimental analysis surprisingly unveils that domain-knowledge constraints can help detect adversarial examples effectively, especially if such constraints are not known to the attacker.

下载PDF全文

下载文献需遵守相关版权规定

论文标题