防止使用高斯混合物损失清洁标签中毒

论文标题

防止使用高斯混合物损失清洁标签中毒

Preventing Clean Label Poisoning using Gaussian Mixture Loss

论文作者

Yaseen, Muhammad, Aadil, Muneeb, Sargsyan, Maria

论文摘要

自2014年Szegedy等人以来。结果表明，精心设计的输入扰动可以导致深层神经网络（DNN）错误地对其标签进行了分类，因此正在进行的研究使DNNS对这种恶意扰动更加强大。在这项工作中，我们考虑了一种称为清洁标签中毒攻击（CLPA）的中毒攻击。 CLPA的目的是注入看似良性的实例，该实例可以大大改变DNN的决策边界，因此在测试时间随后的查询可能会被错误分类。我们认为，在训练期间，可以通过强大的特征在倒数第二层中遵循网络的特征，以遵循网络的特征来嵌入对CLPA的强烈防御。通过具有这样的先验知识，我们可以系统地评估示例的不寻常性，鉴于它声称是标签。我们通过对MNIST和CIFAR数据集的实验来演示我们的内置防御。我们在每个数据集上训练两个型号：一个通过SoftMax训练，另一个通过LGM训练。我们表明，使用LGM可以大大降低CLPA的有效性，而没有其他数据消毒开销。复制我们的结果的代码可在线获得。

Since 2014 when Szegedy et al. showed that carefully designed perturbations of the input can lead Deep Neural Networks (DNNs) to wrongly classify its label, there has been an ongoing research to make DNNs more robust to such malicious perturbations. In this work, we consider a poisoning attack called Clean Labeling poisoning attack (CLPA). The goal of CLPA is to inject seemingly benign instances which can drastically change decision boundary of the DNNs due to which subsequent queries at test time can be mis-classified. We argue that a strong defense against CLPA can be embedded into the model during the training by imposing features of the network to follow a Large Margin Gaussian Mixture distribution in the penultimate layer. By having such a prior knowledge, we can systematically evaluate how unusual the example is, given the label it is claiming to be. We demonstrate our builtin defense via experiments on MNIST and CIFAR datasets. We train two models on each dataset: one trained via softmax, another via LGM. We show that using LGM can substantially reduce the effectiveness of CLPA while having no additional overhead of data sanitization. The code to reproduce our results is available online.

下载PDF全文

下载文献需遵守相关版权规定

论文标题