用“肥沃”在R中创建可重现数据分析的最佳条件

论文标题

用“肥沃”在R中创建可重现数据分析的最佳条件

Creating optimal conditions for reproducible data analysis in R with 'fertile'

论文作者

Bertin, Audrey M., Baumer, Benjamin S.

论文摘要

科学知识的进步越来越多地取决于确保数据驱动的研究可再现：两个具有相同数据的人获得相同的结果。但是，尽管可重复性的必要性很明显，但存在重大的行为和技术挑战，阻碍了其广泛的实施，并且在已发表研究中构成可重复性的标准尚无明确的共识。我们提出了肥沃的，这是一个R包，重点是程序员在R中进行数据科学项目时犯的一系列常见错误，主要是通过RSTUDIO综合开发环境。肥沃以两种模式运行：主动（首先要防止可重复性错误发生），然后追溯（分析已经为潜在问题编写的代码）。此外，肥沃的旨在教育用户为什么他们的错误是有问题的以及如何解决这些问题。

The advancement of scientific knowledge increasingly depends on ensuring that data-driven research is reproducible: that two people with the same data obtain the same results. However, while the necessity of reproducibility is clear, there are significant behavioral and technical challenges that impede its widespread implementation, and no clear consensus on standards of what constitutes reproducibility in published research. We present fertile, an R package that focuses on a series of common mistakes programmers make while conducting data science projects in R, primarily through the RStudio integrated development environment. fertile operates in two modes: proactively (to prevent reproducibility mistakes from happening in the first place), and retroactively (analyzing code that is already written for potential problems). Furthermore, fertile is designed to educate users on why their mistakes are problematic and how to fix them.

下载PDF全文

下载文献需遵守相关版权规定

论文标题