论文标题
攻击不可知论数据集:迈向音频深击检测的概括和稳定
Attack Agnostic Dataset: Towards Generalization and Stabilization of Audio DeepFake Detection
论文作者
论文摘要
音频深击允许创建高质量,令人信服的话语,因此由于其潜在的应用或假新闻而构成威胁。检测这些操作的方法应以良好的概括和稳定性为特征,从而导致对训练中未明确包含的技术进行攻击的稳健性。在这项工作中,我们引入了攻击不可知论数据集 - 两个音频深击和一个反欺骗数据集的组合,由于使用攻击,这些数据集可以使检测方法更好地概括。我们对当前的DeepFake检测方法进行了彻底的分析,并考虑了不同的音频特征(前端)。此外,我们提出了一个基于LCNN的模型,该模型具有LFCC和MEL-SPECTROGRAM前端,该模型不仅具有良好的概括和稳定性结果的特征,而且还显示了基于LFCC的模式的改进 - 我们在所有折叠上降低了两个折叠的标准偏差,并在两个折叠中降低了5%。
Audio DeepFakes allow the creation of high-quality, convincing utterances and therefore pose a threat due to its potential applications such as impersonation or fake news. Methods for detecting these manipulations should be characterized by good generalization and stability leading to robustness against attacks conducted with techniques that are not explicitly included in the training. In this work, we introduce Attack Agnostic Dataset - a combination of two audio DeepFakes and one anti-spoofing datasets that, thanks to the disjoint use of attacks, can lead to better generalization of detection methods. We present a thorough analysis of current DeepFake detection methods and consider different audio features (front-ends). In addition, we propose a model based on LCNN with LFCC and mel-spectrogram front-end, which not only is characterized by a good generalization and stability results but also shows improvement over LFCC-based mode - we decrease standard deviation on all folds and EER in two folds by up to 5%.