论文标题

代码克隆验证的基于机器学习的框架

A Machine Learning Based Framework for Code Clone Validation

论文作者

Mostaeen, Golam, Roy, Banani, Roy, Chanchal, Schneider, Kevin, Svajlenko, Jeffrey

论文摘要

代码克隆是一对代码片段,在相似的软件系统之间或之间。由于代码克隆通常会对软件系统的可维护性产生负面影响,因此在过去十年中已经提出和研究了几种代码克隆检测技术和工具。为了检测所有可能的类似源代码模式,克隆检测工具在语法级别上工作,而缺乏特定用户的偏好。这通常意味着必须在分析之前手动检查克隆,以便从考虑中删除这些误报。这种手动克隆验证工作非常耗时,而且通常容易出错,尤其是大规模克隆检测。在本文中,我们提出了一种用于自动化验证过程的机器学习方法。我们的基于机器学习的方法用于无需人类检查即可自动验证克隆。因此,所提出的方法可用于从检测结果中删除假阳性克隆,自动评估任何给定的一组数据集的任何克隆检测器的精度,评估现有的克隆基准数据集,甚至用于构建新的克隆基准标和数据集,并以最少的努力来构建新的克隆基准数据集。在几个不同软件系统中的几个克隆探测器检测到的克隆的实验中,我们发现与多个专家法官的手动验证相比,我们的方法的准确度高达87.4%。所提出的方法还显示了几项比较研究的结果,并采用了克隆分类的现有相关方法。

A code clone is a pair of code fragments, within or between software systems that are similar. Since code clones often negatively impact the maintainability of a software system, several code clone detection techniques and tools have been proposed and studied over the last decade. To detect all possible similar source code patterns in general, the clone detection tools work on the syntax level while lacking user-specific preferences. This often means the clones must be manually inspected before analysis in order to remove those false positives from consideration. This manual clone validation effort is very time-consuming and often error-prone, in particular for large-scale clone detection. In this paper, we propose a machine learning approach for automating the validation process. Our machine learning-based approach is used to automatically validate clones without human inspection. Thus the proposed approach can be used to remove the false positive clones from the detection results, automatically evaluate the precision of any clone detectors for any given set of datasets, evaluate existing clone benchmark datasets, or even be used to build new clone benchmarks and datasets with minimum effort. In an experiment with clones detected by several clone detectors in several different software systems, we found our approach has an accuracy of up to 87.4% when compared against the manual validation by multiple expert judges. The proposed method also shows better results in several comparative studies with the existing related approaches for clone classification.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源