论文标题

DCDETECTOR:基于分布式深层集合学习的IoT终端漏洞挖掘系统,源代码表示

DCDetector: An IoT terminal vulnerability mining system based on distributed deep ensemble learning under source code representation

论文作者

Zhou, Wen

论文摘要

上下文:IoT系统基础架构平台设施脆弱性攻击已成为网络安全攻击的主要战场。大多数传统脆弱性挖掘方法都依赖脆弱性检测工具来实现脆弱性发现。但是,由于工具的僵硬性和文件大小的限制,其可扩展性相对较低,不能应用于大型功率大数据字段。目的:研究的目的是智能地检测到C/C ++等高级语言源代码中的漏洞。这使我们能够提出与敏感句子相关的源代码切片的代码表示,并通过设计分布式深层集合学习模型来检测漏洞。方法:在本文中,提出了一种新的定向漏洞挖掘方法,以解决大规模数据漏洞挖掘的问题。通过提取敏感的功能和语句,形成了敏感的脆弱代码库。基于AST流的漏洞代码将具有较高粒度的基于AST流的漏洞切片通过随机采样模块在源代码上执行DOC2VEC句子矢量化,通过BI-LSTM Trainer通过分布式培训获得不同的分类结果,并通过投票获得最终的分类结果。结果:此方法设计并实现了分布式深层学习系统软件漏洞挖掘系统,称为dcdetector。它可以通过使用代码的句法信息来进行准确的预测,并且是分析大规模漏洞数据的有效方法。结论:实验表明,这种方法可以降低传统静态分析的假阳性率,并提高机器学习的性能和准确性。

Context: The IoT system infrastructure platform facility vulnerability attack has become the main battlefield of network security attacks. Most of the traditional vulnerability mining methods rely on vulnerability detection tools to realize vulnerability discovery. However, due to the inflexibility of tools and the limitation of file size, its scalability It is relatively low and cannot be applied to large-scale power big data fields. Objective: The goal of the research is to intelligently detect vulnerabilities in source codes of high-level languages such as C/C++. This enables us to propose a code representation of sensitive sentence-related slices of source code, and to detect vulnerabilities by designing a distributed deep ensemble learning model. Method: In this paper, a new directional vulnerability mining method of parallel ensemble learning is proposed to solve the problem of large-scale data vulnerability mining. By extracting sensitive functions and statements, a sensitive statement library of vulnerable codes is formed. The AST stream-based vulnerability code slice with higher granularity performs doc2vec sentence vectorization on the source code through the random sampling module, obtains different classification results through distributed training through the Bi-LSTM trainer, and obtains the final classification result by voting. Results: This method designs and implements a distributed deep ensemble learning system software vulnerability mining system called DCDetector. It can make accurate predictions by using the syntactic information of the code, and is an effective method for analyzing large-scale vulnerability data. Conclusion: Experiments show that this method can reduce the false positive rate of traditional static analysis and improve the performance and accuracy of machine learning.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源