论文标题
不要流汗小东西,将其余的分类:样品屏蔽以保护文本分类器免受对抗攻击
Don't sweat the small stuff, classify the rest: Sample Shielding to protect text classifiers against adversarial attacks
论文作者
论文摘要
深度学习(DL)被广泛用于文本分类。但是,研究人员已经证明了这种分类者对对抗性攻击的脆弱性。攻击者以一种误导分类器的方式修改文本,同时保持原始含义接近完整。最先进的(SOTA)攻击算法遵循对文本进行最小更改的一般原则,以免危害语义。利用这一点,我们提出了一种新颖而直观的防御策略,称为样品屏蔽。它是攻击者和分类器不可知论,不需要对分类器或外部资源进行任何重新配置,并且易于实现。从本质上讲,我们对输入文本的子集进行了采样,对它们进行分类并将其汇总为最终决定。我们将三个流行的DL文本分类器屏蔽样品屏蔽,在现实的威胁环境中对三个数据集中的四个SOTA攻击者测试其弹性。即使鉴于了解我们的屏蔽策略的优势,对手的攻击成功率也仅为10%,只有一个例外,通常<5%。此外,当应用于原始文本时,样品屏蔽将保持几乎原始精度。至关重要的是,我们表明,SOTA攻击者的“最小变化”方法会导致关键漏洞,可以通过直观的采样策略来防御。
Deep learning (DL) is being used extensively for text classification. However, researchers have demonstrated the vulnerability of such classifiers to adversarial attacks. Attackers modify the text in a way which misleads the classifier while keeping the original meaning close to intact. State-of-the-art (SOTA) attack algorithms follow the general principle of making minimal changes to the text so as to not jeopardize semantics. Taking advantage of this we propose a novel and intuitive defense strategy called Sample Shielding. It is attacker and classifier agnostic, does not require any reconfiguration of the classifier or external resources and is simple to implement. Essentially, we sample subsets of the input text, classify them and summarize these into a final decision. We shield three popular DL text classifiers with Sample Shielding, test their resilience against four SOTA attackers across three datasets in a realistic threat setting. Even when given the advantage of knowing about our shielding strategy the adversary's attack success rate is <=10% with only one exception and often < 5%. Additionally, Sample Shielding maintains near original accuracy when applied to original texts. Crucially, we show that the `make minimal changes' approach of SOTA attackers leads to critical vulnerabilities that can be defended against with an intuitive sampling strategy.