论文标题
使用弱监督的深度学习对人体CT扫描的多种疾病进行分类
Classification of Multiple Diseases on Body CT Scans using Weakly Supervised Deep Learning
论文作者
论文摘要
目的:使用放射学文本报告的自动提取标签为三种不同的器官系统设计多疾病的分类器。材料和方法:这项回顾性研究包括共有12,092名患者(平均57 +-18; 6,172妇女)用于模型开发和测试(2012年至2012年)。基于规则的算法用于从12,092例患者的13,667次身体CT扫描中提取19,225个疾病标签。使用三维密集词,分割了三个器官系统:肺和胸膜;肝脏和胆囊;还有肾脏和输尿管。对于每个器官,在所有三个模型中,一个三维卷积神经网络与四种常见疾病相比,总共有15个不同的标签。对2,158 CT体积的子集进行了测试,相对于2133例患者手动派生的参考标签(平均年龄58 +-18; 1079妇女)。据报道,通过DELONG方法,曲线(AUC)的曲线(AUC)中,性能是95%的置信区间。结果:提取标签的手动验证确认了15个不同标签的91%至99%的精度。肺和胸膜标签的AUC为:肺不骨0.77(95%CI:0.74,0.81),结节0.65(0.61,0.69),0.89(0.86,0.92),积液0.97(0.96,0.98)(0.96,0.98),以及没有明显的疾病0.89(0.89(0.89)(0.89(0.89)(0.89)(0.89)(0.89,91)。肝脏和胆囊的AUC为:肝胆管钙化0.62(95%CI:0.56,0.67),病变0.73(0.69,0.77),扩张0.87(0.84,0.90),Fatty 0.89(0.89(0.86,0.92),没有明显的疾病,0.82(0.82)(0.82(0.82)(0.82(0.82)(0.82)(0.82(0.82)。肾脏和输液器的AUC是:Stone 0.83(95%CI:0.79,0.87),萎缩0.92(0.89,0.94),病变0.68(0.64,0.72),Cyst 0.70(0.66,0.73),没有明显的疾病0.79(0.79(0.79)(0.75,0.85,0.83)。结论:弱监督的深度学习模型能够对多个器官系统中的多种疾病进行分类。
Purpose: To design multi-disease classifiers for body CT scans for three different organ systems using automatically extracted labels from radiology text reports.Materials & Methods: This retrospective study included a total of 12,092 patients (mean age 57 +- 18; 6,172 women) for model development and testing (from 2012-2017). Rule-based algorithms were used to extract 19,225 disease labels from 13,667 body CT scans from 12,092 patients. Using a three-dimensional DenseVNet, three organ systems were segmented: lungs and pleura; liver and gallbladder; and kidneys and ureters. For each organ, a three-dimensional convolutional neural network classified no apparent disease versus four common diseases for a total of 15 different labels across all three models. Testing was performed on a subset of 2,158 CT volumes relative to 2,875 manually derived reference labels from 2133 patients (mean age 58 +- 18;1079 women). Performance was reported as receiver operating characteristic area under the curve (AUC) with 95% confidence intervals by the DeLong method. Results: Manual validation of the extracted labels confirmed 91% to 99% accuracy across the 15 different labels. AUCs for lungs and pleura labels were: atelectasis 0.77 (95% CI: 0.74, 0.81), nodule 0.65 (0.61, 0.69), emphysema 0.89 (0.86, 0.92), effusion 0.97 (0.96, 0.98), and no apparent disease 0.89 (0.87, 0.91). AUCs for liver and gallbladder were: hepatobiliary calcification 0.62 (95% CI: 0.56, 0.67), lesion 0.73 (0.69, 0.77), dilation 0.87 (0.84, 0.90), fatty 0.89 (0.86, 0.92), and no apparent disease 0.82 (0.78, 0.85). AUCs for kidneys and ureters were: stone 0.83 (95% CI: 0.79, 0.87), atrophy 0.92 (0.89, 0.94), lesion 0.68 (0.64, 0.72), cyst 0.70 (0.66, 0.73), and no apparent disease 0.79 (0.75, 0.83). Conclusion: Weakly-supervised deep learning models were able to classify diverse diseases in multiple organ systems.