学会使用代码绘制的源代码将源代码映射到软件漏洞

论文标题

学会使用代码绘制的源代码将源代码映射到软件漏洞

Learning to map source code to software vulnerability using code-as-a-graph

论文作者

Suneja, Sahil, Zheng, Yunhui, Zhuang, Yufan, Laredo, Jim, Morari, Alessandro

论文摘要

我们从安全角度探讨了图神经网络在学习源代码的细微差别时的适用性。具体而言，从节点和边缘之间的关系方面，可以从其图表中学习源代码中漏洞的签名。我们创建一个称为AI4VA的管道，该管道首先将示例源代码编码到代码属性图中。然后以保留其语义信息的方式对提取的图进行矢量化。然后，使用几个这样的图形来训练一个封闭式的图形神经网络，以自动提取模板，从而将脆弱样本与健康样本区分开来。我们的模型优于静态分析仪，经典的机器学习，以及我们试验的三个数据集中的两个基于CNN和RNN的深度学习模型。因此，我们表明，与现有的代码和线性序列编码方法相比，代码编码对于漏洞检测更有意义。（2019年10月提交，论文＃28，ICST）

We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective. Specifically, whether signatures of vulnerabilities in source code can be learned from its graph representation, in terms of relationships between nodes and edges. We create a pipeline we call AI4VA, which first encodes a sample source code into a Code Property Graph. The extracted graph is then vectorized in a manner which preserves its semantic information. A Gated Graph Neural Network is then trained using several such graphs to automatically extract templates differentiating the graph of a vulnerable sample from a healthy one. Our model outperforms static analyzers, classic machine learning, as well as CNN and RNN-based deep learning models on two of the three datasets we experiment with. We thus show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches. (Submitted Oct 2019, Paper #28, ICST)

下载PDF全文

下载文献需遵守相关版权规定

论文标题