基于依赖关系的神经表示，用于分类程序行

论文标题

基于依赖关系的神经表示，用于分类程序行

Dependency-Based Neural Representations for Classifying Lines of Programs

论文作者

Srikant, Shashank, Lesimple, Nicolas, O'Reilly, Una-May

论文摘要

我们研究了将程序行分类为包含漏洞或不使用机器学习的问题。这样的线路级分类任务需要一个程序表示，这超出了行中存在的代币的推理。我们在潜在特征空间中寻求分布式表示形式，该表示可以捕获出现在程序线上的代币的控制和数据依赖性，同时还可以确保具有相似含义的行具有相似的特征。我们提出了一种神经体系结构，即Vulcan，成功地证明了这两个要求。它将有关令牌的上下文信息提取在一条线中，并将其作为抽象语法树（AST）路径输入，并带有带有注意机制的双向LSTM。它通过递归地嵌入了最近定义的线，同时表示令牌中令牌的含义。在我们的实验中，Vulcan与最先进的分类器相比，该分类器需要对程序进行大量预处理，这表明使用深度学习来对程序依赖信息进行建模。

We investigate the problem of classifying a line of program as containing a vulnerability or not using machine learning. Such a line-level classification task calls for a program representation which goes beyond reasoning from the tokens present in the line. We seek a distributed representation in a latent feature space which can capture the control and data dependencies of tokens appearing on a line of program, while also ensuring lines of similar meaning have similar features. We present a neural architecture, Vulcan, that successfully demonstrates both these requirements. It extracts contextual information about tokens in a line and inputs them as Abstract Syntax Tree (AST) paths to a bi-directional LSTM with an attention mechanism. It concurrently represents the meanings of tokens in a line by recursively embedding the lines where they are most recently defined. In our experiments, Vulcan compares favorably with a state-of-the-art classifier, which requires significant preprocessing of programs, suggesting the utility of using deep learning to model program dependence information.

下载PDF全文

下载文献需遵守相关版权规定

论文标题