论文标题
对12铅ECG分类的深度学习模型的分析揭示了类似于诊断标准的学习功能
Analysis of a Deep Learning Model for 12-Lead ECG Classification Reveals Learned Features Similar to Diagnostic Criteria
论文作者
论文摘要
尽管表现出色,但深层神经网络在临床实践中仍然无法接受,这被认为部分是由于它们缺乏解释性。在这项工作中,我们将归因方法应用于预先训练的深神经网络(DNN),以进行12个潜在的心电图分类,以打开此“黑匣子”,并了解模型预测与学到的特征之间的关系。我们从公共数据集中对数据进行了分类,并且归因方法为分类信号的每个示例分配了“相关得分”。这允许分析网络在培训过程中学到的东西,我们建议的定量方法:a)类,b)铅和c)平均节拍的平均相关性得分。与健康对照组相比,对心房颤动(AF)和左束分支区块(LBBB)的相关性评分的分析表明,它们的平均值a)随着零分类的概率而增加,并且在零左右时对应于虚假分类,而b)b)相对于临床建议对应于哪些导致考虑的临床建议。此外,c)可见的p波和一致的t波分别在AF和LBBB分类中显然导致负相关得分。总而言之,我们的分析表明,DNN学到了类似于心脏病学教科书知识的功能。
Despite their remarkable performance, deep neural networks remain unadopted in clinical practice, which is considered to be partially due to their lack in explainability. In this work, we apply attribution methods to a pre-trained deep neural network (DNN) for 12-lead electrocardiography classification to open this "black box" and understand the relationship between model prediction and learned features. We classify data from a public data set and the attribution methods assign a "relevance score" to each sample of the classified signals. This allows analyzing what the network learned during training, for which we propose quantitative methods: average relevance scores over a) classes, b) leads, and c) average beats. The analyses of relevance scores for atrial fibrillation (AF) and left bundle branch block (LBBB) compared to healthy controls show that their mean values a) increase with higher classification probability and correspond to false classifications when around zero, and b) correspond to clinical recommendations regarding which lead to consider. Furthermore, c) visible P-waves and concordant T-waves result in clearly negative relevance scores in AF and LBBB classification, respectively. In summary, our analysis suggests that the DNN learned features similar to cardiology textbook knowledge.