论文标题
在DataVerse数据存储库平台中推进计算可重复性
Advancing computational reproducibility in the Dataverse data repository platform
论文作者
论文摘要
最近的可重复性案例研究引起了人们的关注,表明许多沉积研究都不可再现。他们的结论之一是,由于没有代码执行所需的运行时环境,数据存储库存储研究数据和代码无法完全促进可重复性。新的专用可重复性工具为代码封装提供了基于云的计算环境,从而实现了研究可移植性和可重复性。但是,它们通常不会像数据存储库那样使研究可发现性,标准化数据引用或长期档案。本文介绍了数据存储库和可重复性工具的缺点,以及如何克服它们,以提高目前已发表和存档的研究成果中目前缺乏计算可重复性。
Recent reproducibility case studies have raised concerns showing that much of the deposited research has not been reproducible. One of their conclusions was that the way data repositories store research data and code cannot fully facilitate reproducibility due to the absence of a runtime environment needed for the code execution. New specialized reproducibility tools provide cloud-based computational environments for code encapsulation, thus enabling research portability and reproducibility. However, they do not often enable research discoverability, standardized data citation, or long-term archival like data repositories do. This paper addresses the shortcomings of data repositories and reproducibility tools and how they could be overcome to improve the current lack of computational reproducibility in published and archived research outputs.