论文标题
Misim:使用上下文感知语义结构的神经代码语义相似性系统
MISIM: A Neural Code Semantics Similarity System Using the Context-Aware Semantics Structure
论文作者
论文摘要
代码语义相似性可用于许多任务,例如代码建议,自动软件缺陷校正和克隆检测。但是,此类系统的准确性尚未达到一定程度的通用可靠性。为了解决这个问题,我们提出机器推断代码相似性(MISIM),这是一个由两个核心组成部分组成的神经代码相似性系统:(i)Misim使用一种新颖的上下文感知语义语义结构,该结构是专门构建的,可以从代码语法中提升语义; (ii)Misim使用可扩展的神经代码相似性评分算法,该算法可用于具有学习参数的各种神经网络体系结构。我们将Misim与四个最先进的系统进行了比较,其中包括另外两个由超过1800万行代码组成的超过328K程序。我们的实验表明,与下一个最佳性能系统相比,Misim的精度(使用MAP@R)优于8.08%。
Code semantics similarity can be used for many tasks such as code recommendation, automated software defect correction, and clone detection. Yet, the accuracy of such systems has not yet reached a level of general purpose reliability. To help address this, we present Machine Inferred Code Similarity (MISIM), a neural code semantics similarity system consisting of two core components: (i)MISIM uses a novel context-aware semantics structure, which was purpose-built to lift semantics from code syntax; (ii)MISIM uses an extensible neural code similarity scoring algorithm, which can be used for various neural network architectures with learned parameters. We compare MISIM to four state-of-the-art systems, including two additional hand-customized models, over 328K programs consisting of over 18 million lines of code. Our experiments show that MISIM has 8.08% better accuracy (using MAP@R) compared to the next best performing system.