论文标题
基于机器学习的多重异常预测,带有大规模胸部计算机断层扫描量
Machine-Learning-Based Multiple Abnormality Prediction with Large-Scale Chest Computed Tomography Volumes
论文作者
论文摘要
放射学的机器学习模型受益于具有高质量标签异常标签的大型数据集。我们策划并分析了19,993名独特患者的36,316卷的胸部计算机断层扫描(CT)数据集。这是报告的最大的乘积大量医学成像数据集。为了注释此数据集,我们开发了一种基于规则的方法,用于从自由文本放射学报告中自动提取异常标签,平均F评分为0.976(最小0.941,最大1.0)。我们还开发了一种使用深卷积神经网络(CNN)的胸部CT体积的多器官多疾病分类的模型。该模型达到了AUROC的分类性能大于0.90,对于18个异常,对于所有83种异常,平均AUROC为0.773,这表明从未经过滤的整个体积CT数据中学习的可行性。我们表明,在更多标签上进行培训可显着提高性能:对于9个标签的子集 - 结节,不透明度,性障碍性,胸膜积液,巩固,巩固,质量,心包积液,心脏瘤,心脏瘤和气胸 - 该模型的平均AUROC增加了10%的培训标签,从9到所有的训练标签增加了10%,并且所有训练标签的数量增加了所有卷和所有83号均值。异常预测模型将公开可用。 36,316 CT卷和标签也将公开获得机构批准。
Machine learning models for radiology benefit from large-scale data sets with high quality labels for abnormalities. We curated and analyzed a chest computed tomography (CT) data set of 36,316 volumes from 19,993 unique patients. This is the largest multiply-annotated volumetric medical imaging data set reported. To annotate this data set, we developed a rule-based method for automatically extracting abnormality labels from free-text radiology reports with an average F-score of 0.976 (min 0.941, max 1.0). We also developed a model for multi-organ, multi-disease classification of chest CT volumes that uses a deep convolutional neural network (CNN). This model reached a classification performance of AUROC greater than 0.90 for 18 abnormalities, with an average AUROC of 0.773 for all 83 abnormalities, demonstrating the feasibility of learning from unfiltered whole volume CT data. We show that training on more labels improves performance significantly: for a subset of 9 labels - nodule, opacity, atelectasis, pleural effusion, consolidation, mass, pericardial effusion, cardiomegaly, and pneumothorax - the model's average AUROC increased by 10% when the number of training labels was increased from 9 to all 83. All code for volume preprocessing, automated label extraction, and the volume abnormality prediction model will be made publicly available. The 36,316 CT volumes and labels will also be made publicly available pending institutional approval.