旨在理解和减轻说话者认可的音频对抗示例

论文标题

旨在理解和减轻说话者认可的音频对抗示例

Towards Understanding and Mitigating Audio Adversarial Examples for Speaker Recognition

论文作者

Chen, Guangke, Zhao, Zhe, Song, Fu, Chen, Sen, Fan, Lingling, Wang, Feng, Wang, Jiashui

论文摘要

发言人识别系统（SRSS）最近被证明容易受到对抗攻击的攻击，从而引发了严重的安全问题。在这项工作中，我们系统地研究了基于确保SRSS的基于对抗性训练的防御。根据SRSS的特征，我们提出了22种不同的转换，并使用扬声器识别的7种有前途的对抗攻击（4个白色盒子和3个Black-Box）对其进行了彻底评估。仔细考虑了国防评估中的最佳实践，我们分析了转换的强度以承受适应性攻击。我们还评估并理解它们在与对抗性训练结合使用时对适应性攻击的有效性。我们的研究提供了许多有用的见解和发现，其中许多与图像和语音识别域中的结论是新的或不一致的，例如，可变和恒定的比特速率语音压缩具有不同的性能，并且某些非差异性转换仍然有效，而对于当前有前途的逃避技术，通常在图像域中运行良好。我们证明，与完整的白色盒子设置中的唯一对抗训练相比，提出的新型特征级转换与对抗训练相比是相当有效的，例如，将准确性提高了13.62％，并提高了两个数量级的攻击成本，而其他转换并不一定提高总体防御能力。这项工作进一步阐明了该领域的研究方向。我们还发布了我们的评估平台SpeakeGuard，以促进进一步的研究。

Speaker recognition systems (SRSs) have recently been shown to be vulnerable to adversarial attacks, raising significant security concerns. In this work, we systematically investigate transformation and adversarial training based defenses for securing SRSs. According to the characteristic of SRSs, we present 22 diverse transformations and thoroughly evaluate them using 7 recent promising adversarial attacks (4 white-box and 3 black-box) on speaker recognition. With careful regard for best practices in defense evaluations, we analyze the strength of transformations to withstand adaptive attacks. We also evaluate and understand their effectiveness against adaptive attacks when combined with adversarial training. Our study provides lots of useful insights and findings, many of them are new or inconsistent with the conclusions in the image and speech recognition domains, e.g., variable and constant bit rate speech compressions have different performance, and some non-differentiable transformations remain effective against current promising evasion techniques which often work well in the image domain. We demonstrate that the proposed novel feature-level transformation combined with adversarial training is rather effective compared to the sole adversarial training in a complete white-box setting, e.g., increasing the accuracy by 13.62% and attack cost by two orders of magnitude, while other transformations do not necessarily improve the overall defense capability. This work sheds further light on the research directions in this field. We also release our evaluation platform SPEAKERGUARD to foster further research.

下载PDF全文

下载文献需遵守相关版权规定

论文标题