论文标题
Github的副驾驶在代码中引入漏洞时是否像人类一样糟糕?
Is GitHub's Copilot as Bad as Humans at Introducing Vulnerabilities in Code?
论文作者
论文摘要
深度学习的几个进步已成功地应用于软件开发过程。最新感兴趣的是使用神经语言模型来构建有助于编写代码的工具,例如副驾驶。在本文中,我们从安全角度对副驾驶生成的代码进行了比较的经验分析。这项研究的目的是确定副驾驶是否与人类开发人员一样糟糕。我们调查了Copilot是否有可能引入与人类开发人员相同的软件漏洞。使用C/C ++漏洞的数据集,我们提示Copilot在导致人类开发人员引入漏洞的情况下生成建议。根据原始漏洞或修复的重新引入,对这些建议进行了两阶段的检查和分类。我们发现Copilot在以25%的速度复制固定代码的同时,在大约33%的时间内复制原始脆弱的代码。但是,这种行为并不一致:副驾驶比其他行为更有可能引入某些类型的漏洞,并且更有可能生成脆弱的代码,以响应与较旧的漏洞相对应的提示。总体而言,鉴于在大量情况下,它没有复制人类开发人员先前引入的漏洞,我们得出结论,尽管在各种漏洞类型中的表现不同,但在代码中引入漏洞时的副作用并不像人类开发人员那样糟糕。
Several advances in deep learning have been successfully applied to the software development process. Of recent interest is the use of neural language models to build tools, such as Copilot, that assist in writing code. In this paper we perform a comparative empirical analysis of Copilot-generated code from a security perspective. The aim of this study is to determine if Copilot is as bad as human developers. We investigate whether Copilot is just as likely to introduce the same software vulnerabilities as human developers. Using a dataset of C/C++ vulnerabilities, we prompt Copilot to generate suggestions in scenarios that led to the introduction of vulnerabilities by human developers. The suggestions are inspected and categorized in a 2-stage process based on whether the original vulnerability or fix is reintroduced. We find that Copilot replicates the original vulnerable code about 33% of the time while replicating the fixed code at a 25% rate. However this behaviour is not consistent: Copilot is more likely to introduce some types of vulnerabilities than others and is also more likely to generate vulnerable code in response to prompts that correspond to older vulnerabilities. Overall, given that in a significant number of cases it did not replicate the vulnerabilities previously introduced by human developers, we conclude that Copilot, despite performing differently across various vulnerability types, is not as bad as human developers at introducing vulnerabilities in code.