防御NLP通过DIRICHLET社区合奏的防御攻击

论文标题

防御NLP通过DIRICHLET社区合奏的防御攻击

Defense against Adversarial Attacks in NLP via Dirichlet Neighborhood Ensemble

论文作者

Zhou, Yi, Zheng, Xiaoqing, Hsieh, Cho-Jui, Chang, Kai-wei, Huang, Xuanjing

论文摘要

尽管神经网络在许多自然语言处理（NLP）任务上取得了突出的表现，但它们很容易受到对抗例子的影响。在本文中，我们提出了Dirichlet社区合奏（DNE），这是一种随机平滑方法，用于训练强大的模型以防御替代攻击。在培训期间，DNE通过将每个单词的载体嵌入载体嵌入向量的输入句子中的凸壳及其同义词的凸面中，并通过培训数据来增强它们，从而形成虚拟句子。通过这种方式，该模型对对抗性攻击具有鲁棒性，同时保持原始清洁数据的性能。 DNE对NLP应用程序的网络体系结构和尺度不可知。我们通过广泛的实验证明，我们的方法始终胜过最近提出的防御方法，该方法通过不同的网络体系结构和多个数据集的大幅度边缘。

Despite neural networks have achieved prominent performance on many natural language processing (NLP) tasks, they are vulnerable to adversarial examples. In this paper, we propose Dirichlet Neighborhood Ensemble (DNE), a randomized smoothing method for training a robust model to defense substitution-based attacks. During training, DNE forms virtual sentences by sampling embedding vectors for each word in an input sentence from a convex hull spanned by the word and its synonyms, and it augments them with the training data. In such a way, the model is robust to adversarial attacks while maintaining the performance on the original clean data. DNE is agnostic to the network architectures and scales to large models for NLP applications. We demonstrate through extensive experimentation that our method consistently outperforms recently proposed defense methods by a significant margin across different network architectures and multiple data sets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题