论文标题
公共卫生信息学:提议使用神经机器翻译的因果死亡顺序
Public Health Informatics: Proposing Causal Sequence of Death Using Neural Machine Translation
论文作者
论文摘要
每年,全球有近5700万人死亡,美国有超过270万。及时,准确而完整的死亡报告在公共卫生中至关重要,因为机构和政府机构依靠死亡报告来分析重要的统计数据并提出对传染病的反应。不准确的死亡报告可能会导致公共卫生政策的潜在误导。然而,确定死亡的原因即使对于经验丰富的医生也要挑战。为了促进医生准确地报告死亡原因,我们提出了一种先进的AI方法,以根据DEDEDENT的最后一次医院出院记录来确定导致死亡的临床状况的长期有序序列。死亡报告中的临床代码顺序被称为因果关系链,编码在国际疾病的第十个修订版中(ICD-10);根据ICD-9-CM官方编码和报告的官方指南,排出记录的优先级临床条件在ICD-9中编码。我们确定了提出因果关系链的三个挑战:临床代码中的两个版本的编码系统,医疗领域知识冲突和数据互操作性。为了克服此顺序到序列问题中的第一个挑战,我们应用神经机器翻译模型来生成目标序列。除了三个精度指标外,我们还通过BLEU(双语评估研究)评分评估了生成的序列的质量,并在100分中获得16.04。为了解决第二项挑战,我们将专家验证的医疗领域知识作为限制,作为生成输出序列的约束,以产生超出可观的可观因果链。最后,我们在快速的医疗保健互操作性资源(FHIR)接口中证明了工作的可用性,以应对第三个挑战。
Each year there are nearly 57 million deaths around the world, with over 2.7 million in the United States. Timely, accurate and complete death reporting is critical in public health, as institutions and government agencies rely on death reports to analyze vital statistics and to formulate responses to communicable diseases. Inaccurate death reporting may result in potential misdirection of public health policies. Determining the causes of death is, nevertheless, challenging even for experienced physicians. To facilitate physicians in accurately reporting causes of death, we present an advanced AI approach to determine a chronically ordered sequence of clinical conditions that lead to death, based on decedent's last hospital discharge record. The sequence of clinical codes on the death report is named as causal chain of death, coded in the tenth revision of International Statistical Classification of Diseases (ICD-10); in line with the ICD-9-CM Official Guidelines for Coding and Reporting, the priority-ordered clinical conditions on the discharge record are coded in ICD-9. We identify three challenges in proposing the causal chain of death: two versions of coding system in clinical codes, medical domain knowledge conflict, and data interoperability. To overcome the first challenge in this sequence-to-sequence problem, we apply neural machine translation models to generate target sequence. Along with three accuracy metrics, we evaluate the quality of generated sequences with the BLEU (BiLingual Evaluation Understudy) score and achieve 16.04 out of 100. To address the second challenge, we incorporate expert-verified medical domain knowledge as constraint in generating output sequence to exclude infeasible causal chains. Lastly, we demonstrate the usability of our work in a Fast Healthcare Interoperability Resources (FHIR) interface to address the third challenge.