论文标题
梅林 - 逃避强化学习的恶意软件
MERLIN -- Malware Evasion with Reinforcement LearnINg
论文作者
论文摘要
除了基于签名和基于启发式的检测技术外,机器学习(ML)还广泛用于推广到新的,从未见过的恶意软件(恶意软件)。但是,已经证明,可以通过欺骗分类器返回错误标签来愚弄ML模型。例如,这些研究通常依赖于基于梯度的攻击而脆弱的预测评分。在更现实的情况下,攻击者对恶意软件检测引擎的产出的信息很少,因此可以实现适度的逃避率。在本文中,我们提出了一种使用DQN加强学习的方法,并加强了算法,以挑战两种基于ML的最先进的检测引擎(Malconv \&Ember)和Gartner分类为领导者AV的商业AV。我们的方法结合了几个操作,修改了Windows便携式执行(PE)文件而不会破坏其功能。我们的方法还确定了哪些动作的表现更好,并编译了详细的漏洞报告,以帮助减轻逃避。我们证明,即使在有限的可用信息的商业AV上,增强也可以达到非常好的逃避率。
In addition to signature-based and heuristics-based detection techniques, machine learning (ML) is widely used to generalize to new, never-before-seen malicious software (malware). However, it has been demonstrated that ML models can be fooled by tricking the classifier into returning the incorrect label. These studies, for instance, usually rely on a prediction score that is fragile to gradient-based attacks. In the context of a more realistic situation where an attacker has very little information about the outputs of a malware detection engine, modest evasion rates are achieved. In this paper, we propose a method using reinforcement learning with DQN and REINFORCE algorithms to challenge two state-of-the-art ML-based detection engines (MalConv \& EMBER) and a commercial AV classified by Gartner as a leader AV. Our method combines several actions, modifying a Windows portable execution (PE) file without breaking its functionalities. Our method also identifies which actions perform better and compiles a detailed vulnerability report to help mitigate the evasion. We demonstrate that REINFORCE achieves very good evasion rates even on a commercial AV with limited available information.