论文标题
通过使用错误报告和测试,更好的自动程序维修
Better Automatic Program Repair by Using Bug Reports and Tests Together
论文作者
论文摘要
自动化计划维修已经部署在行业中,但仍然担心维修质量。最近的研究表明,修复工具产生不正确(但看似正确的)补丁的主要原因之一是故障定位(FL)。本文表明,将自然语言错误报告中的信息和测试执行结合起来,当定位故障时可能会对维修质量产生重大积极影响。例如,现有的具有此类FL的维修工具能够在缺陷4J基准测试中正确修复7个缺陷,没有先前的工具可以正确维修。蓝调,我们是第一个基于信息的语句级别的FL技术,不需要培训数据。我们进一步开发了RAFL,这是将多种FL技术相结合的第一种无监督方法,该方法的表现优于监督方法。使用RAFL,我们通过将蓝色与基于频谱(SBFL)技术相结合来创建SBIR。 SBIR在815个现实世界缺陷上进行了评估,始终将越野车陈述高于其基本技术。然后,我们修改了三种最先进的维修工具,即Arja,Sequencer和Simfix,将SBIR,SBFL和Blues用作其内部FL。我们评估了689个现实世界缺陷上产生的斑块的质量。 Arja和Suemencer使用SBIR从SBIR:ARJA显着受益,可以正确修复28个缺陷,但仅使用SBFL进行21个缺陷,只有15个使用蓝色;使用SBIR正确修复12个缺陷,但仅使用SBFL,只有4个使用Blues,可以使用SBIR进行序列。 SimFix(具有克服差的内部机制),使用SBIR和SBFL正确修复30个缺陷,但只有13个使用蓝色。我们的工作是对使用多个软件工件进行自动化程序维修的首次研究,我们有前途的发现表明,未来在此方向上的研究可能会富有成果。
Automated program repair is already deployed in industry, but concerns remain about repair quality. Recent research has shown that one of the main reasons repair tools produce incorrect (but seemingly correct) patches is imperfect fault localization (FL). This paper demonstrates that combining information from natural-language bug reports and test executions when localizing faults can have a significant positive impact on repair quality. For example, existing repair tools with such FL are able to correctly repair 7 defects in the Defects4J benchmark that no prior tools have repaired correctly. We develop, Blues, the first information-retrieval-based, statement-level FL technique that requires no training data. We further develop RAFL, the first unsupervised method for combining multiple FL techniques, which outperforms a supervised method. Using RAFL, we create SBIR by combining Blues with a spectrum-based (SBFL) technique. Evaluated on 815 real-world defects, SBIR consistently ranks buggy statements higher than its underlying techniques. We then modify three state-of-the-art repair tools, Arja, SequenceR, and SimFix, to use SBIR, SBFL, and Blues as their internal FL. We evaluate the quality of the produced patches on 689 real-world defects. Arja and SequenceR significantly benefit from SBIR: Arja using SBIR correctly repairs 28 defects, but only 21 using SBFL, and only 15 using Blues; SequenceR using SBIR correctly repairs 12 defects, but only 10 using SBFL, and only 4 using Blues. SimFix, (which has internal mechanisms to overcome poor FL), correctly repairs 30 defects using SBIR and SBFL, but only 13 using Blues. Our work is the first investigation of simultaneously using multiple software artifacts for automated program repair, and our promising findings suggest future research in this directions is likely to be fruitful.