论文标题
内容自适应像素离散化以提高模型鲁棒性
Content-Adaptive Pixel Discretization to Improve Model Robustness
论文作者
论文摘要
诸如像素离散化之类的预处理防御措施很有吸引力,可以由于其简单性而消除对抗性攻击。但是,除了MNIST等简单数据集外,它们已被证明是无效的。我们假设现有的离散方法失败了,因为对整个数据集使用固定的代码簿限制了它们平衡图像表示和代码字的能力。我们首先正式证明,自适应代码手册可以提供比固定代码手册作为某些数据集的预处理防御的更强鲁棒性保证。基于该洞察力,我们提出了一个称为“基本特征”的内容自适应像素离散的防御防御,该防御能力将图像离散到每个图像自适应代码手册,以减少颜色空间。然后,我们发现,可以通过在离散化之前应用自适应模糊来进一步优化基本功能,以在确定代码簿之前将扰动的像素值推回其原始值。在自适应攻击方面,我们表明内容自适应像素离散扩展了数据集的范围,这些数据集在L_2和L_INFINITY鲁棒性方面都受益,在这些数据集中,以前发现的代码书已被发现失败。我们的发现表明,内容自适应像素离散化应该是使模型可靠的曲目的一部分。
Preprocessing defenses such as pixel discretization are appealing to remove adversarial attacks due to their simplicity. However, they have been shown to be ineffective except on simple datasets like MNIST. We hypothesize that existing discretization approaches failed because using a fixed codebook for the entire dataset limits their ability to balance image representation and codeword separability. We first formally prove that adaptive codebooks can provide stronger robustness guarantees than fixed codebooks as a preprocessing defense on some datasets. Based on that insight, we propose a content-adaptive pixel discretization defense called Essential Features, which discretizes the image to a per-image adaptive codebook to reduce the color space. We then find that Essential Features can be further optimized by applying adaptive blurring before the discretization to push perturbed pixel values back to their original value before determining the codebook. Against adaptive attacks, we show that content-adaptive pixel discretization extends the range of datasets that benefit in terms of both L_2 and L_infinity robustness where previously fixed codebooks were found to have failed. Our findings suggest that content-adaptive pixel discretization should be part of the repertoire for making models robust.