论文标题

软件存储库如何开采?工作流,方法,可重复性和工具的系统文献综述

How are Software Repositories Mined? A Systematic Literature Review of Workflows, Methodologies, Reproducibility, and Tools

论文作者

Tutko, Adam, Henley, Austin Z., Mockus, Audris

论文摘要

随着开源软件的出现,提供了以前专有软件开发数据的名副其实的宝库。这向学术界的任何人开放了经验软件工程研究领域。但是,从软件项目中挖掘的数据需要广泛的处理,需要尽最大努力处理以确保有效的结论。由于软件开发实践和工具已经改变了二十年,因此我们旨在了解最先进的研究工作流程并突出潜在的挑战。我们通过从领先的会议中抽样一千多篇论文,并从数据工作流程,方法,可重复性和工具的角度分析了286篇最相关的论文来采用系统的文献综述。我们发现,涉及数据集选择的研究工作流程的重要组成部分尤其有问题,这引发了有关现有文献中结果的一般性的疑问。此外,我们发现相当数量的论文提供了很少或没有可重复性的说明,这是数据密集型领域的实质性缺陷。实际上,有33%的论文没有提供有关其数据如何检索的信息。基于这些发现,我们提出了通过现有工具解决这些缺点的方法,还提供了建议,以改善研究工作流程和研究的可重复性。

With the advent of open source software, a veritable treasure trove of previously proprietary software development data was made available. This opened the field of empirical software engineering research to anyone in academia. Data that is mined from software projects, however, requires extensive processing and needs to be handled with utmost care to ensure valid conclusions. Since the software development practices and tools have changed over two decades, we aim to understand the state-of-the-art research workflows and to highlight potential challenges. We employ a systematic literature review by sampling over one thousand papers from leading conferences and by analyzing the 286 most relevant papers from the perspective of data workflows, methodologies, reproducibility, and tools. We found that an important part of the research workflow involving dataset selection was particularly problematic, which raises questions about the generality of the results in existing literature. Furthermore, we found a considerable number of papers provide little or no reproducibility instructions -- a substantial deficiency for a data-intensive field. In fact, 33% of papers provide no information on how their data was retrieved. Based on these findings, we propose ways to address these shortcomings via existing tools and also provide recommendations to improve research workflows and the reproducibility of research.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源