论文标题

医学文本中年代事件的有效表示

An efficient representation of chronological events in medical texts

论文作者

Kormilitzin, Andrey, Vaci, Nemanja, Liu, Qiang, Ni, Hao, Nenadic, Goran, Nevado-Holgado, Alejo

论文摘要

在这项工作中,我们解决了捕获纵向电子健康记录(EHR)中包含的顺序信息的问题。临床注释是一种特定类型的EHR数据,是丰富的信息来源,从业者通常开发聪明的解决方案如何最大程度地提高自由文本中包含的顺序信息。我们提出了一种从临床注释中可用的时间顺序事件中学习的系统方法。提出的方法学{\ it路径签名}框架创建了任何类型的顺序事件的非参数层次表示,并且可以用作下游统计学习任务的功能。该方法是使用英国二级护理心理健康EHR数据开发和外部验证的,该数据是针对预测诊断患有阿尔茨海默氏病的患者的生存风险的特定任务。将基于签名的模型与常见的生存随机森林模型进行了比较。我们的结果表明,在首次入场后20个月的时间点,风险预测AUC的15.4 $ \%$增加,签名方法的表现优于基线混合效应模型13.2 $ \%$。

In this work we addressed the problem of capturing sequential information contained in longitudinal electronic health records (EHRs). Clinical notes, which is a particular type of EHR data, are a rich source of information and practitioners often develop clever solutions how to maximise the sequential information contained in free-texts. We proposed a systematic methodology for learning from chronological events available in clinical notes. The proposed methodological {\it path signature} framework creates a non-parametric hierarchical representation of sequential events of any type and can be used as features for downstream statistical learning tasks. The methodology was developed and externally validated using the largest in the UK secondary care mental health EHR data on a specific task of predicting survival risk of patients diagnosed with Alzheimer's disease. The signature-based model was compared to a common survival random forest model. Our results showed a 15.4$\%$ increase of risk prediction AUC at the time point of 20 months after the first admission to a specialist memory clinic and the signature method outperformed the baseline mixed-effects model by 13.2 $\%$.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源