论文标题

是什么使流行的学术AI存储库?

What Makes a Popular Academic AI Repository?

论文作者

Fan, Yuanrui, Xia, Xin, Lo, David, Hassan, Ahmed E., Li, Shanping

论文摘要

许多人工智能研究人员都在发布GitHub存储库中伴随其论文的发布代码,数据和其他资源。在本文中,我们将这些存储库称为学术AI存储库。我们的初步研究表明,高度引用的论文更有可能拥有流行的学术AI存储库(反之亦然)。因此,在这项研究中,我们对学术AI存储库进行了一项实证研究,以突出针对AI研究人员的流行学术AI存储库的良好软件工程实践。 我们收集了1,149个学术AI存储库,其中我们将最多的恒星数量的前20%的存储库标记为流行,我们将最低的70%存储库标记为不受欢迎。其余10%的存储库设置为流行和不受欢迎的学术AI存储库之间的差距。我们提出21个功能来表征学术AI存储库的软件工程实践。我们的实验结果表明,在研究的11个功能中,流行和不受欢迎的学术AI存储库在统计学上有显着差异 - 表明两组存储库具有明显不同的软件工程实践。此外,我们发现在读书文件中,指向其他GitHub存储库的链接数量,读数文件中的图像数量和包含许可证是区分两组Academic AI存储库的最重要功能。我们的数据集和代码可公开与社区共享。

Many AI researchers are publishing code, data and other resources that accompany their papers in GitHub repositories. In this paper, we refer to these repositories as academic AI repositories. Our preliminary study shows that highly cited papers are more likely to have popular academic AI repositories (and vice versa). Hence, in this study, we perform an empirical study on academic AI repositories to highlight good software engineering practices of popular academic AI repositories for AI researchers. We collect 1,149 academic AI repositories, in which we label the top 20% repositories that have the most number of stars as popular, and we label the bottom 70% repositories as unpopular. The remaining 10% repositories are set as a gap between popular and unpopular academic AI repositories. We propose 21 features to characterize the software engineering practices of academic AI repositories. Our experimental results show that popular and unpopular academic AI repositories are statistically significantly different in 11 of the studied features---indicating that the two groups of repositories have significantly different software engineering practices. Furthermore, we find that the number of links to other GitHub repositories in the README file, the number of images in the README file and the inclusion of a license are the most important features for differentiating the two groups of academic AI repositories. Our dataset and code are made publicly available to share with the community.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源