估计科学图像重用的无效模型，以支持研究完整性调查

论文标题

估计科学图像重用的无效模型，以支持研究完整性调查

Estimating a Null Model of Scientific Image Reuse to Support Research Integrity Investigations

论文作者

Acuna, Daniel E., Xiang, Ziyue

论文摘要

当科学中有一个可疑的人物重复使用案件时，研究诚信研究人员通常会发现很难反驳作者声称“偶然发生了”。换句话说，当图像特征存在“碰撞”时，很难证明它是否很少出现。在本文中，我们提供了一种方法来预测图像特征的稀有性，通过统计上估算所有科学图像中随机发生的机会。我们的方法基于使用PubMed Open Access子集数据集中使用7+百万张图像的ORB特征的高维密度估计。我们表明，这种方法可以通过为科学图像再利用提供零假设，从而在审议过程中提供p值，从而导致有意义的反馈。我们将模型应用于越来越复杂的图像样本，并确认它会按预期产生较小的p值。我们讨论了研究完整性调查以及未来工作的应用。

When there is a suspicious figure reuse case in science, research integrity investigators often find it difficult to rebut authors claiming that "it happened by chance". In other words, when there is a "collision" of image features, it is difficult to justify whether it appears rarely or not. In this article, we provide a method to predict the rarity of an image feature by statistically estimating the chance of it randomly occurring across all scientific imagery. Our method is based on high-dimensional density estimation of ORB features using 7+ million images in the PubMed Open Access Subset dataset. We show that this method can lead to meaningful feedback during research integrity investigations by providing a null hypothesis for scientific image reuse and thus a p-value during deliberations. We apply the model to a sample of increasingly complex imagery and confirm that it produces decreasingly smaller p-values as expected. We discuss applications to research integrity investigations as well as future work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题