论文标题
跨项目软件漏洞检测通过域适应和最大额度原理
Cross Project Software Vulnerability Detection via Domain Adaptation and Max-Margin Principle
论文作者
论文摘要
由于计算机软件的普遍存在,软件漏洞(SVS)已成为一个普遍,严重和至关重要的问题。已经提出了许多基于机器学习的方法来解决软件漏洞检测(SVD)问题。但是,SVD仍然有两个开放和重大问题,就i)学习自动表示以提高SVD的预测性能,ii)解决通常需要专家经常需要艰苦的标签努力的标记漏洞数据集的稀缺性。在本文中,我们提出了一种新颖的端到端方法来解决这两个关键问题。我们首先利用自动表示学习,并以深层域的适应性进行软件漏洞检测。然后,我们提出了一个新型的跨域内核分类器,利用最大额度额度原理,以显着改善从标记项目到未标记项目的软件漏洞的传输学习过程。现实世界软件数据集的实验结果表明,我们所提出的方法优于最先进的基准。简而言之,与使用数据集中的第二高方法相比,我们的方法在SVD中获得了更高的F1量化性能,这是SVD中最重要的度量,从1.83%到6.25%。我们发布的源代码样本可在https://github.com/vannguyennd/dam2p上公开获取
Software vulnerabilities (SVs) have become a common, serious and crucial concern due to the ubiquity of computer software. Many machine learning-based approaches have been proposed to solve the software vulnerability detection (SVD) problem. However, there are still two open and significant issues for SVD in terms of i) learning automatic representations to improve the predictive performance of SVD, and ii) tackling the scarcity of labeled vulnerabilities datasets that conventionally need laborious labeling effort by experts. In this paper, we propose a novel end-to-end approach to tackle these two crucial issues. We first exploit the automatic representation learning with deep domain adaptation for software vulnerability detection. We then propose a novel cross-domain kernel classifier leveraging the max-margin principle to significantly improve the transfer learning process of software vulnerabilities from labeled projects into unlabeled ones. The experimental results on real-world software datasets show the superiority of our proposed method over state-of-the-art baselines. In short, our method obtains a higher performance on F1-measure, the most important measure in SVD, from 1.83% to 6.25% compared to the second highest method in the used datasets. Our released source code samples are publicly available at https://github.com/vannguyennd/dam2p