听不到邪恶：通过多任务学习对自动语音识别的对抗性鲁棒性

论文标题

听不到邪恶：通过多任务学习对自动语音识别的对抗性鲁棒性

Hear No Evil: Towards Adversarial Robustness of Automatic Speech Recognition via Multi-Task Learning

论文作者

Das, Nilaksh, Chau, Duen Horng

论文摘要

随着自动语音识别（ASR）系统现在正在野外广泛部署，对抗性攻击的日益威胁引发了有关使用此类系统的安全性和可靠性的严重问题。另一方面，多任务学习（MTL）在可以抵抗计算机视觉域中对抗性攻击的训练模型中显示了成功。在这项工作中，我们研究了执行此类多任务学习对语音域中ASR模型对抗性鲁棒性的影响。我们通过结合语义上不同的任务，例如重音分类和ASR进行广泛的MTL实验，并评估广泛的对抗环境。我们的详尽分析表明，执行MTL执行语义上多样化的任务始终如一，使对抗性攻击更难成功。我们还详细讨论了严重的陷阱及其相关补救措施，这些补救措施对MTL模型的鲁棒性产生了重大影响。我们提出的MTL方法与单任务学习基准相比（分别注意力解码器和CTC）相比，具有从17.25到59.90的对抗性目标的绝对改进。我们的是第一项深入研究，它揭示了ASR多任务学习的对抗性鲁棒性获得。

As automatic speech recognition (ASR) systems are now being widely deployed in the wild, the increasing threat of adversarial attacks raises serious questions about the security and reliability of using such systems. On the other hand, multi-task learning (MTL) has shown success in training models that can resist adversarial attacks in the computer vision domain. In this work, we investigate the impact of performing such multi-task learning on the adversarial robustness of ASR models in the speech domain. We conduct extensive MTL experimentation by combining semantically diverse tasks such as accent classification and ASR, and evaluate a wide range of adversarial settings. Our thorough analysis reveals that performing MTL with semantically diverse tasks consistently makes it harder for an adversarial attack to succeed. We also discuss in detail the serious pitfalls and their related remedies that have a significant impact on the robustness of MTL models. Our proposed MTL approach shows considerable absolute improvements in adversarially targeted WER ranging from 17.25 up to 59.90 compared to single-task learning baselines (attention decoder and CTC respectively). Ours is the first in-depth study that uncovers adversarial robustness gains from multi-task learning for ASR.

下载PDF全文

下载文献需遵守相关版权规定

论文标题