论文标题

叉熵:评估开源软件项目的多样性

Fork Entropy: Assessing the Diversity of Open Source Software Projects' Forks

论文作者

Wang, Liang, Zheng, Zhiwen, Wu, Xiangchen, Sang, Baihui, Zhang, Jierui, Tao, Xianping

论文摘要

在开源软件(OSS)平台上,例如GitHub,分叉和接受拉普雷斯是OSS项目接收贡献的重要方法,尤其是来自无法直接投入源存储库的外部贡献者。拥有大量叉子通常被认为是一个受欢迎的项目的指标。尽管已经进行了广泛的研究,以了解分叉的原因,分叉,特征和叉子影响之间的交流,但几乎没有定量措施可以提供一种简单但有益的方法来获得有关OSS项目叉子的见解。受到生物多样性和OSS团队多样性的研究的启发,在本文中,我们提出了一种衡量OSS Project Orks(即其叉子种群)多样性的方法。我们根据Rao的二次熵设计了一个新颖的叉熵指标,以根据叉子对项目文件的修改来衡量这种多样性。借助包括对称性,连续性和单调性在内的属性,拟议的叉熵指标可有效地量化项目的叉子种群的多样性。为了进一步研究拟议的指标的实用性,我们通过从GitHub上的五十个项目中检索的数据进行实证研究。我们观察到项目的叉子熵与不同结果变量之间的显着相关性,包括通过外部贡献者的承诺数量,外部贡献者的套管要求的接受率以及报告的错误的数量来衡量项目的外部生产力。我们还观察到叉熵与其他因素(例如叉子数量)之间的显着相互作用。结果表明,分叉的熵有效地丰富了我们对OSS项目的叉子的理解,超出了简单的叉子数量,并且可以潜在地支持进一步的研究和应用。

On open source software (OSS) platforms such as GitHub, forking and accepting pull-requests is an important approach for OSS projects to receive contributions, especially from external contributors who cannot directly commit into the source repositories. Having a large number of forks is often considered as an indicator of a project being popular. While extensive studies have been conducted to understand the reasons of forking, communications between forks, features and impacts of forks, there are few quantitative measures that can provide a simple yet informative way to gain insights about an OSS project's forks besides their count. Inspired by studies on biodiversity and OSS team diversity, in this paper, we propose an approach to measure the diversity of an OSS project's forks (i.e., its fork population). We devise a novel fork entropy metric based on Rao's quadratic entropy to measure such diversity according to the forks' modifications to project files. With properties including symmetry, continuity, and monotonicity, the proposed fork entropy metric is effective in quantifying the diversity of a project's fork population. To further examine the usefulness of the proposed metric, we conduct empirical studies with data retrieved from fifty projects on GitHub. We observe significant correlations between a project's fork entropy and different outcome variables including the project's external productivity measured by the number of external contributors' commits, acceptance rate of external contributors' pull-requests, and the number of reported bugs. We also observe significant interactions between fork entropy and other factors such as the number of forks. The results suggest that fork entropy effectively enriches our understanding of OSS projects' forks beyond the simple number of forks, and can potentially support further research and applications.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源