这会改变该问题的答案吗？关联错误的描述和代码更改以评估补丁正确性

论文标题

这会改变该问题的答案吗？关联错误的描述和代码更改以评估补丁正确性

Is this Change the Answer to that Problem? Correlating Descriptions of Bug and Code Changes for Evaluating Patch Correctness

论文作者

Tian, Haoye, Tang, Xunzhu, Habib, Andrew, Wang, Shangwen, Liu, Kui, Xia, Xin, Klein, Jacques, Bissyandé, Tegawendé F.

论文摘要

在这项工作中，我们提出了一个新颖的观点，以解决补丁正确性评估的问题：正确的贴片实现了“答案”对越野车行为提出的问题的变化。具体而言，我们将贴片正确性评估变成一个问题回答问题。为了解决这个问题，我们的直觉是自然语言处理可以提供必要的表示和模型来评估错误（问题）和补丁之间的语义相关性（答案）。具体而言，我们认为是输入错误报告以及生成的补丁的自然语言描述。我们的方法，Quatrain，首先考虑了最先进的消息生成模型，以产生与每个生成的补丁相关的相关输入。然后，我们利用神经网络体系结构来学习错误报告和提交消息之间的语义相关性。针对三个错误数据集生成的9135个补丁程序的大数据集（缺陷4J，Bugs.s.s.jar和Bears）的实验表明，Quatrain可以在预测补丁的正确性时达到0.886的AUC，并在过滤62％的62％错误的补丁时召回93％正确的补丁。我们的实验结果进一步证明了投入质量对预测性能的影响。我们进一步执行实验，以强调该模型确实了解了错误报告与预测的代码更改描述之间的关系。最后，我们与先前的工作进行比较，并讨论我们方法的好处。

In this work, we propose a novel perspective to the problem of patch correctness assessment: a correct patch implements changes that "answer" to a problem posed by buggy behaviour. Concretely, we turn the patch correctness assessment into a Question Answering problem. To tackle this problem, our intuition is that natural language processing can provide the necessary representations and models for assessing the semantic correlation between a bug (question) and a patch (answer). Specifically, we consider as inputs the bug reports as well as the natural language description of the generated patches. Our approach, Quatrain, first considers state of the art commit message generation models to produce the relevant inputs associated to each generated patch. Then we leverage a neural network architecture to learn the semantic correlation between bug reports and commit messages. Experiments on a large dataset of 9135 patches generated for three bug datasets (Defects4j, Bugs.jar and Bears) show that Quatrain can achieve an AUC of 0.886 on predicting patch correctness, and recalling 93% correct patches while filtering out 62% incorrect patches. Our experimental results further demonstrate the influence of inputs quality on prediction performance. We further perform experiments to highlight that the model indeed learns the relationship between bug reports and code change descriptions for the prediction. Finally, we compare against prior work and discuss the benefits of our approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题