论文标题
基于相似性的心力衰竭患者射血分数的预测
Similarity-based prediction of Ejection Fraction in Heart Failure Patients
论文作者
论文摘要
生物医学研究越来越多地利用现实世界的证据(RWE)来促进新型临床表型的发现,并更好地描述医疗治疗的长期影响。但是,由于收集过程中固有的局限性,RWE通常缺乏患者的关键特征,尤其是当无法使用ICD-10等数据标准直接编码这些功能时。在这里,我们提出了一种新型的数据驱动的统计机器学习方法,该方法通过局部可能性(FILL)称为特征,旨在通过利用患者之间的特征相似性来推断缺失的特征。我们使用一个特别具有挑战性的问题测试我们的方法:将心力衰竭患者与保留的射血分数减少(分别为HFREF和HFPEF)进行区分。任务的复杂性源于三个方面:这两个方面具有许多共同的特征和治疗方法,只有相关诊断的一部分可能已记录,并且RWE数据集经常缺少有关射血分数的信息。尽管遇到了这些困难,但我们的方法被证明能够推断出患有HFPEF的心力衰竭患者,当考虑到包含11,950和10,051例心力衰竭患者的两个RWE数据集时,精度高于80%。与经典方法(例如逻辑回归和随机森林)相比,这是一种改进,这些方法只能达到精度<73%。最后,这种方法使我们能够分析哪些特征通常与HFPEF患者相关。例如,我们发现针对房颤的特定诊断代码和长期使用抗凝剂的个人历史通常是识别HFPEF患者的关键。
Biomedical research is increasingly employing real world evidence (RWE) to foster discoveries of novel clinical phenotypes and to better characterize long term effect of medical treatments. However, due to limitations inherent in the collection process, RWE often lacks key features of patients, particularly when these features cannot be directly encoded using data standards such as ICD-10. Here we propose a novel data-driven statistical machine learning approach, named Feature Imputation via Local Likelihood (FILL), designed to infer missing features by exploiting feature similarity between patients. We test our method using a particularly challenging problem: differentiating heart failure patients with reduced versus preserved ejection fraction (HFrEF and HFpEF respectively). The complexity of the task stems from three aspects: the two share many common characteristics and treatments, only part of the relevant diagnoses may have been recorded, and the information on ejection fraction is often missing from RWE datasets. Despite these difficulties, our method is shown to be capable of inferring heart failure patients with HFpEF with a precision above 80% when considering multiple scenarios across two RWE datasets containing 11,950 and 10,051 heart failure patients. This is an improvement when compared to classical approaches such as logistic regression and random forest which were only able to achieve a precision < 73%. Finally, this approach allows us to analyse which features are commonly associated with HFpEF patients. For example, we found that specific diagnostic codes for atrial fibrillation and personal history of long-term use of anticoagulants are often key in identifying HFpEF patients.