攻击作为防御：使用对自动语音识别系统的攻击设计强大的音频验证码

论文标题

攻击作为防御：使用对自动语音识别系统的攻击设计强大的音频验证码

Attacks as Defenses: Designing Robust Audio CAPTCHAs Using Attacks on Automatic Speech Recognition Systems

论文作者

Abdullah, Hadi, Karlekar, Aditya, Prasad, Saurabh, Rahman, Muhammad Sajidur, Blue, Logan, Bauer, Luke A., Bindschaedler, Vincent, Traynor, Patrick

论文摘要

音频验证码应该为在线资源提供强有力的防御；但是，语音到文本机制的进步使这些防御能力无效。音频验证码不能简单地被放弃，因为它们是由W3C专门命名为重要的可访问能力的。因此，更强大的音频验证码对安全且可访问的网络的未来很重要。我们将研究有关对语音到文本系统的攻击的最新文献，以启发构建强大的，原理驱动的音频防御。我们首先比较了最近的20篇攻击文件，分类和衡量其适合作为新的“强大到转录的基础”，但“让人类很容易理解” Captchas。在证明这些攻击都没有足够的时间之后，我们提出了一种新的机制，该机制既相对可理解（通过用户研究进行评估）又很难自动转录（即$ p（{\ rm Transcripts}）= 4 \ times 10^{ - 5} $）。最后，我们证明了我们的音频样本在授予语音到文本系统时被检测为验证码的可能性很高（$ p（{\ rm逃避}）= 1.77 \ times 10^{ - 4} $）。这样一来，我们不仅证明了大约四个数量级的验证码更难破解，而且可以根据人类和计算机处理音频的方式来设计此类系统。

Audio CAPTCHAs are supposed to provide a strong defense for online resources; however, advances in speech-to-text mechanisms have rendered these defenses ineffective. Audio CAPTCHAs cannot simply be abandoned, as they are specifically named by the W3C as important enablers of accessibility. Accordingly, demonstrably more robust audio CAPTCHAs are important to the future of a secure and accessible Web. We look to recent literature on attacks on speech-to-text systems for inspiration for the construction of robust, principle-driven audio defenses. We begin by comparing 20 recent attack papers, classifying and measuring their suitability to serve as the basis of new "robust to transcription" but "easy for humans to understand" CAPTCHAs. After showing that none of these attacks alone are sufficient, we propose a new mechanism that is both comparatively intelligible (evaluated through a user study) and hard to automatically transcribe (i.e., $P({\rm transcription}) = 4 \times 10^{-5}$). Finally, we demonstrate that our audio samples have a high probability of being detected as CAPTCHAs when given to speech-to-text systems ($P({\rm evasion}) = 1.77 \times 10^{-4}$). In so doing, we not only demonstrate a CAPTCHA that is approximately four orders of magnitude more difficult to crack, but that such systems can be designed based on the insights gained from attack papers using the differences between the ways that humans and computers process audio.

下载PDF全文

下载文献需遵守相关版权规定

论文标题