一种简单但有效的方法，用于对抗单词替代攻击

论文标题

一种简单但有效的方法，用于对抗单词替代攻击

A Simple Yet Efficient Method for Adversarial Word-Substitute Attack

论文作者

Li, Tianle, Yang, Yi

论文摘要

NLP研究人员提出了不同的单词掩体黑盒攻击，可以欺骗文本分类模型。在这种攻击中，对手不断向目标模型发送精心设计的对抗性查询，直到它可以成功达到预期的结果为止。最先进的攻击方法通常需要数百或数千个查询才能找到一个对抗性示例。在本文中，我们研究了一个复杂的对手是否可以用少得多的查询来攻击系统。我们提出了一种简单而有效的方法，可以将对抗性查询的平均数量减少3-30倍，并保持攻击效率。这项研究强调，对手可以愚弄成本要少得多的深入NLP模型。

NLP researchers propose different word-substitute black-box attacks that can fool text classification models. In such attack, an adversary keeps sending crafted adversarial queries to the target model until it can successfully achieve the intended outcome. State-of-the-art attack methods usually require hundreds or thousands of queries to find one adversarial example. In this paper, we study whether a sophisticated adversary can attack the system with much less queries. We propose a simple yet efficient method that can reduce the average number of adversarial queries by 3-30 times and maintain the attack effectiveness. This research highlights that an adversary can fool a deep NLP model with much less cost.

下载PDF全文

下载文献需遵守相关版权规定

论文标题