论文标题
基于行业尺度的IR漏洞本地化:Facebook的视角
Industry-scale IR-based Bug Localization: A Perspective from Facebook
论文作者
论文摘要
我们探索在大型工业环境Facebook上基于信息检索(IR)的错误本地化方法的应用。 Facebook的代码基库迅速发展,每天都有成千上万的代码更改致力于一个单片存储库。当检测到错误时,通常是时间敏感的,必须识别导致错误以将其恢复或修复它的提交。这一事实通常具有复杂而笨拙的特征,例如堆栈痕迹和其他元数据,这一事实变得复杂。代码提交还具有与之相关的各种功能,从开发人员注释到测试结果。这给错误的本地化方法带来了独特的挑战,使其成为高度不平凡的操作。 在本文中,我们提出了一些基于行业级的IR错误本地化的实际问题,并提出了Bug2Commit,该工具旨在解决这些问题。我们还评估了来自软件工程社区的现有基于IR的本地化技术的有效性,并发现在Facebook上常见的复杂查询或文档的存在下,现有方法的性能不如Bug2Commit。我们在Facebook上的三个应用程序上评估了Bug2Commit:移动应用程序中的客户端崩溃,服务器端性能回归和移动仿真测试以进行性能。我们发现Bug2 -Commit的表现优于现有方法的准确性高达17%,从而减少了分类回归的时间,并归因于模拟中发现的错误。
We explore the application of Information Retrieval (IR) based bug localization methods at a large industrial setting, Facebook. Facebook's code base evolves rapidly, with thousands of code changes being committed to a monolithic repository every day. When a bug is detected, it is often time-sensitive and imperative to identify the commit causing the bug in order to either revert it or fix it. This is complicated by the fact that bugs often manifest with complex and unwieldy features, such as stack traces and other metadata. Code commits also have various features associated with them, ranging from developer comments to test results. This poses unique challenges to bug localization methods, making it a highly non-trivial operation. In this paper we lay out several practical concerns for industry-level IR-based bug localization, and propose Bug2Commit, a tool that is designed to address these concerns. We also assess the effectiveness of existing IR-based localization techniques from the software engineering community, and find that in the presence of complex queries or documents, which are common at Facebook, existing approaches do not perform as well as Bug2Commit. We evaluate Bug2Commit on three applications at Facebook: client-side crashes from the mobile app, server-side performance regressions, and mobile simulation tests for performance. We find that Bug2Commit outperforms the accuracy of existing approaches by up to 17%, leading to reduced time for triaging regressions and attributing bugs found in simulations.