论文标题
估计科学图像重用的无效模型,以支持研究完整性调查
Estimating a Null Model of Scientific Image Reuse to Support Research Integrity Investigations
论文作者
论文摘要
当科学中有一个可疑的人物重复使用案件时,研究诚信研究人员通常会发现很难反驳作者声称“偶然发生了”。换句话说,当图像特征存在“碰撞”时,很难证明它是否很少出现。在本文中,我们提供了一种方法来预测图像特征的稀有性,通过统计上估算所有科学图像中随机发生的机会。我们的方法基于使用PubMed Open Access子集数据集中使用7+百万张图像的ORB特征的高维密度估计。我们表明,这种方法可以通过为科学图像再利用提供零假设,从而在审议过程中提供p值,从而导致有意义的反馈。我们将模型应用于越来越复杂的图像样本,并确认它会按预期产生较小的p值。我们讨论了研究完整性调查以及未来工作的应用。
When there is a suspicious figure reuse case in science, research integrity investigators often find it difficult to rebut authors claiming that "it happened by chance". In other words, when there is a "collision" of image features, it is difficult to justify whether it appears rarely or not. In this article, we provide a method to predict the rarity of an image feature by statistically estimating the chance of it randomly occurring across all scientific imagery. Our method is based on high-dimensional density estimation of ORB features using 7+ million images in the PubMed Open Access Subset dataset. We show that this method can lead to meaningful feedback during research integrity investigations by providing a null hypothesis for scientific image reuse and thus a p-value during deliberations. We apply the model to a sample of increasingly complex imagery and confirm that it produces decreasingly smaller p-values as expected. We discuss applications to research integrity investigations as well as future work.