Scelmo：来自语言模型的源代码嵌入

论文标题

Scelmo：来自语言模型的源代码嵌入

SCELMo: Source Code Embeddings from Language Models

论文作者

Karampatsis, Rafael - Michael, Sutton, Charles

论文摘要

代币在计算机程序中的连续嵌入已被用来支持各种软件开发工具，包括可读性，代码搜索和程序维修。上下文嵌入在自然语言处理中很常见，但以前尚未在软件工程中应用。我们为基于语言模型的计算机程序介绍了一组新的深层上下文化单词表示形式。我们使用Elmo（来自语言模型的嵌入）Peters等人（2018）训练一组嵌入。我们研究这些嵌入在微调的下游检测任务时是否有效。我们表明，即使是在相对较小的程序中训练的低维嵌入式嵌入式嵌入也可以改善用于错误检测的最新机器学习系统。

Continuous embeddings of tokens in computer programs have been used to support a variety of software development tools, including readability, code search, and program repair. Contextual embeddings are common in natural language processing but have not been previously applied in software engineering. We introduce a new set of deep contextualized word representations for computer programs based on language models. We train a set of embeddings using the ELMo (embeddings from language models) framework of Peters et al (2018). We investigate whether these embeddings are effective when fine-tuned for the downstream task of bug detection. We show that even a low-dimensional embedding trained on a relatively small corpus of programs can improve a state-of-the-art machine learning system for bug detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题