论文标题
在说话者识别中针对有针对性攻击的对抗性扰动
Inaudible Adversarial Perturbations for Targeted Attack in Speaker Recognition
论文作者
论文摘要
演讲者的认可是生物识别身份验证中的一个流行话题,许多深度学习方法已经取得了非凡的表现。但是,在图像和语音应用中都显示了深层神经网络容易受到对抗性示例的影响。在这项研究中,我们旨在利用这一弱点来对基于X-Vector的说话者识别系统进行针对性的对抗攻击。我们建议基于频率掩盖的心理声学原理,产生不可听式的对抗性扰动,以实现针对扬声器识别系统的有针对性的白盒攻击。具体而言,我们在原始音频的掩蔽阈值下收缩了扰动,而不是使用常见的L_P标准来测量扰动。 Aishell-1语料库的实验表明,我们的方法最高为任意性别说话者目标的攻击成功率高达98.5%,同时保留了与听众无法区分的属性。此外,在将所提出的方法应用于完全无关的波形时,我们还会实现有效的扬声器攻击,例如音乐。
Speaker recognition is a popular topic in biometric authentication and many deep learning approaches have achieved extraordinary performances. However, it has been shown in both image and speech applications that deep neural networks are vulnerable to adversarial examples. In this study, we aim to exploit this weakness to perform targeted adversarial attacks against the x-vector based speaker recognition system. We propose to generate inaudible adversarial perturbations achieving targeted white-box attacks to speaker recognition system based on the psychoacoustic principle of frequency masking. Specifically, we constrict the perturbation under the masking threshold of original audio, instead of using a common l_p norm to measure the perturbations. Experiments on Aishell-1 corpus show that our approach yields up to 98.5% attack success rate to arbitrary gender speaker targets, while retaining indistinguishable attribute to listeners. Furthermore, we also achieve an effective speaker attack when applying the proposed approach to a completely irrelevant waveform, such as music.