论文标题

通过医学概念丰富无监督的用户嵌入

Enriching Unsupervised User Embedding via Medical Concepts

论文作者

Huang, Xiaolei, Dernoncourt, Franck, Dredze, Mark

论文摘要

电子健康记录(EHR)中的临床注意事项表明,富裕的患者信息有关疾病诊断的推理表型和研究患者特征的研究。无监督的用户嵌入旨在将患者编码为没有人类监督的固定长度向量。从临床笔记中提取的医学概念包含患者及其临床类别之间的丰富联系。但是,从临床笔记中使用的用户嵌入的现有无监督方法并不能明确纳入医学概念。在这项研究中,我们提出了一个概念无监督的用户,该用户嵌入了共同利用两个临床语料库,模拟III和糖尿病的文本文档和医学概念。我们评估用户嵌入外部和内在任务,包括表型分类,院内死亡率预测,患者检索和患者相关性。两个临床语料库的实验表明,我们的方法超过了无监督的基线,并且合并医学概念可以显着改善基线性能。

Clinical notes in Electronic Health Records (EHR) present rich documented information of patients to inference phenotype for disease diagnosis and study patient characteristics for cohort selection. Unsupervised user embedding aims to encode patients into fixed-length vectors without human supervisions. Medical concepts extracted from the clinical notes contain rich connections between patients and their clinical categories. However, existing unsupervised approaches of user embeddings from clinical notes do not explicitly incorporate medical concepts. In this study, we propose a concept-aware unsupervised user embedding that jointly leverages text documents and medical concepts from two clinical corpora, MIMIC-III and Diabetes. We evaluate user embeddings on both extrinsic and intrinsic tasks, including phenotype classification, in-hospital mortality prediction, patient retrieval, and patient relatedness. Experiments on the two clinical corpora show our approach exceeds unsupervised baselines, and incorporating medical concepts can significantly improve the baseline performance.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源