论文标题
使用图像生物标志物标准化计划来自不同放射线工具包 /工具箱的基准测试功能
Benchmarking features from different radiomics toolkits / toolboxes using Image Biomarkers Standardization Initiative
论文作者
论文摘要
关于放射线特征术语,基础数学或其实施尚无共识。这创建了一个方案,其中使用不同工具箱提取的功能不能用于构建或验证相同的模型,从而导致非属性结果。在这项研究中,使用图像生物标准标准化计划(IBSI)建立的幻影和基准值用于比较放射线特征的变化,同时使用6个公开可用的软件程序和1个内部放射线学管道。提取了所有IBSI标准化功能(总共11个,共173个)。计算从不同软件中提取的特征值与IBSI基准值之间的相对差异以衡量软件间协议。为了更好地理解变化,特征根据其特性进一步分为三类:1)形态,2)统计/直方图和3)纹理特征。尽管在各个程序中都观察到了大多数放射素学特征的良好一致性,但对于形态特征,观察到相对较差的一致性。在使用不同的灰度离散化方法的程序中也发现了显着差异。由于这些程序不包含所有IBSI功能,因此使用Venn和Insing图分析了每个类别的定量评估水平,并使用两个临时指标进行了量化。两种指标的形态特征的分数最低,表明在软件程序之间并未始终评估形态特征。我们得出的结论是,使用不同的软件程序计算出的放射线特征可能并不相同且可靠。需要进一步的研究来标准化放射线特征提取的工作流程。
There is no consensus regarding the radiomic feature terminology, the underlying mathematics, or their implementation. This creates a scenario where features extracted using different toolboxes could not be used to build or validate the same model leading to a non-generalization of radiomic results. In this study, the image biomarker standardization initiative (IBSI) established phantom and benchmark values were used to compare the variation of the radiomic features while using 6 publicly available software programs and 1 in-house radiomics pipeline. All IBSI-standardized features (11 classes, 173 in total) were extracted. The relative differences between the extracted feature values from the different software and the IBSI benchmark values were calculated to measure the inter-software agreement. To better understand the variations, features are further grouped into 3 categories according to their properties: 1) morphology, 2) statistic/histogram and 3)texture features. While a good agreement was observed for a majority of radiomics features across the various programs, relatively poor agreement was observed for morphology features. Significant differences were also found in programs that use different gray level discretization approaches. Since these programs do not include all IBSI features, the level of quantitative assessment for each category was analyzed using Venn and the UpSet diagrams and also quantified using two ad hoc metrics. Morphology features earns lowest scores for both metrics, indicating that morphological features are not consistently evaluated among software programs. We conclude that radiomic features calculated using different software programs may not be identical and reliable. Further studies are needed to standardize the workflow of radiomic feature extraction.