具有符号执行和图形内核的恶意软件分析

论文标题

具有符号执行和图形内核的恶意软件分析

Malware Analysis with Symbolic Execution and Graph Kernel

论文作者

Van Ouytsel, Charles-Henry Bertrand, Legay, Axel

论文摘要

恶意软件分析技术分为静态和动态分析。这两种技术都可以通过诸如混淆等规避技术绕过。在一系列作品中，作者促进了使用符号执行与机器学习相结合以避免这种陷阱的使用。这些作品中的大多数依赖于基于图形的自然表示，然后可以将其插入基于图的学习算法（例如GSPAN）中。这种方法有两个主要问题。第一个是计算图表的成本。实际上，使用图需要一个人来计算和表示分析文件的整个状态空间。由于这种计算太麻烦了，因此这些技术通常依赖于制定策略来计算行为的代表性子图。不幸的是，有效的图形构建策略仍然很弱。第二个问题是分类本身。基于图形的机器学习算法依赖于比较最大的常见结构。这占恶意软件签名的小但特定部分。此外，它不允许我们处理有效的算法，例如支持向量机。我们为基于机器学习的分类提供了一个新的高效开源工具链。我们还探讨了如何在该过程中使用图形内核技术。我们专注于一维Weisfeiler-Lehman内核，该内核可以捕获图之间的局部相似性。我们的实验结果表明，我们的方法以令人印象深刻的因素优于现有的方法。

Malware analysis techniques are divided into static and dynamic analysis. Both techniques can be bypassed by circumvention techniques such as obfuscation. In a series of works, the authors have promoted the use of symbolic executions combined with machine learning to avoid such traps. Most of those works rely on natural graph-based representations that can then be plugged into graph-based learning algorithms such as Gspan. There are two main problems with this approach. The first one is in the cost of computing the graph. Indeed, working with graphs requires one to compute and representing the entire state-space of the file under analysis. As such computation is too cumbersome, the techniques often rely on developing strategies to compute a representative subgraph of the behaviors. Unfortunately, efficient graph-building strategies remain weakly explored. The second problem is in the classification itself. Graph-based machine learning algorithms rely on comparing the biggest common structures. This sidelines small but specific parts of the malware signature. In addition, it does not allow us to work with efficient algorithms such as support vector machine. We propose a new efficient open source toolchain for machine learning-based classification. We also explore how graph-kernel techniques can be used in the process. We focus on the 1-dimensional Weisfeiler-Lehman kernel, which can capture local similarities between graphs. Our experimental results show that our approach outperforms existing ones by an impressive factor.

下载PDF全文

下载文献需遵守相关版权规定

论文标题