论文标题

基于多阈值代码克隆检测

Multi-threshold token-based code clone detection

论文作者

Golubev, Yaroslav, Poletansky, Viktor, Povarov, Nikita, Bryksin, Timofey

论文摘要

克隆检测在软件工程中起着重要作用。在单个项目中找到克隆会引入可能的重构机会,在不同的项目之间,它可用于检测代码再利用或可能违反许可的情况。 在本文中,我们提出了对基于TOKENS的克隆检测的修改,该检测允许检测更多的克隆对,而不会通过实施多个阈值搜索而失去精确度,即几次进行搜索,旨在对不同的克隆进行搜索。为了打击这种方法带来的操作时间的增加,我们提出了一种优化,该优化可以显着减少搜索之间检测到的克隆的重叠。 我们评估了两个不同尺寸数据集上两个不同流行的克隆检测工具的方法。该技术的实现允许将检测到的克隆的数量增加40.5-56.6%,对于不同的数据集。 BigCloneBench评估还表明,检测强烈3型克隆的回忆从37.5%增加到59.6%。

Clone detection plays an important role in software engineering. Finding clones within a single project introduces possible refactoring opportunities, and between different projects it could be used for detecting code reuse or possible licensing violations. In this paper, we propose a modification to bag-of-tokens based clone detection that allows detecting more clone pairs of greater diversity without losing precision by implementing a multi-threshold search, i.e. conducting the search several times, aimed at different groups of clones. To combat the increase in operation time that this approach brings about, we propose an optimization that allows to significantly decrease the overlap in detected clones between the searches. We evaluate the method for two different popular clone detection tools on two datasets of different sizes. The implementation of the technique allows to increase the number of detected clones by 40.5-56.6% for different datasets. BigCloneBench evaluation also shows that the recall of detecting Strongly Type-3 clones increases from 37.5% to 59.6%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源