通过顺序的保形预测集，改善医学成像中AI疾病严重程度评级的可信赖性

论文标题

通过顺序的保形预测集，改善医学成像中AI疾病严重程度评级的可信赖性

Improving Trustworthiness of AI Disease Severity Rating in Medical Imaging with Ordinal Conformal Prediction Sets

论文作者

Lu, Charles, Angelopoulos, Anastasios N., Pomerantz, Stuart

论文摘要

人们认为，深度学习模型以不可预测的和可能的灾难性方式失败，因此受到了监管AI的监管批准和广泛的临床部署。缺乏统计上严格的不确定性定量是破坏对AI结果的信任的重要因素。无分配不确定性量化的最新发展通过为任意数据分布的黑框模型提供可靠性保证，作为正式有效的有限样本预测间隔，为这些问题提供了实用解决方案。我们的工作将这些新的不确定性定量方法（特别是共形预测）应用于腰椎MRI中脊柱狭窄严重程度的深度学习模型。我们展示了一种形成序数预测集的技术，该技术可以保证在用户定义的概率（置信区间）内包含正确的狭窄严重程度。在通过深度学习模型处理的409个MRI考试的数据集中，共形方法提供了较小的预测集尺寸的紧密覆盖范围。此外，我们通过量化与随机成像异常（例如，金属伪影和肿瘤）相比，与随机样本相比，可以在预测性能中降低预测性能，从而探讨了具有高不确定性预测（大型预测集）的标记病例的潜在临床适用性。

The regulatory approval and broad clinical deployment of medical AI have been hampered by the perception that deep learning models fail in unpredictable and possibly catastrophic ways. A lack of statistically rigorous uncertainty quantification is a significant factor undermining trust in AI results. Recent developments in distribution-free uncertainty quantification present practical solutions for these issues by providing reliability guarantees for black-box models on arbitrary data distributions as formally valid finite-sample prediction intervals. Our work applies these new uncertainty quantification methods -- specifically conformal prediction -- to a deep-learning model for grading the severity of spinal stenosis in lumbar spine MRI. We demonstrate a technique for forming ordinal prediction sets that are guaranteed to contain the correct stenosis severity within a user-defined probability (confidence interval). On a dataset of 409 MRI exams processed by the deep-learning model, the conformal method provides tight coverage with small prediction set sizes. Furthermore, we explore the potential clinical applicability of flagging cases with high uncertainty predictions (large prediction sets) by quantifying an increase in the prevalence of significant imaging abnormalities (e.g. motion artifacts, metallic artifacts, and tumors) that could degrade confidence in predictive performance when compared to a random sample of cases.

下载PDF全文

下载文献需遵守相关版权规定

论文标题