论文标题
使用机器学习将口头尸检叙事和二进制特征融合到高血糖的死亡中
Using Machine Learning to Fuse Verbal Autopsy Narratives and Binary Features in the Analysis of Deaths from Hyperglycaemia
论文作者
论文摘要
低水平收入国家面临着由于缺乏死亡原因(COD)的数据而面临的挑战,这可能会限制人口健康和疾病管理的决策。口头尸检(VA)可以在没有强大的死亡注册系统的情况下提供有关COD的信息。 VA由结构化数据组成,结合了数字和二进制功能,以及作为开放式叙事文本的一部分的非结构化数据。这项研究在分析VA报告的结构化和非结构化组件时评估了各种机器学习方法的性能。在二进制特征,文本特征的三种设置中,通过交叉验证对算法进行了训练和测试,并结合了来自南非农村的VA报告得出的二进制和文本特征。获得的结果表明叙事文本特征包含用于确定COD的有价值的信息,并且二进制和文本功能的组合改善了自动化的COD分类任务。 关键词:糖尿病,口头尸检,死亡原因,机器学习,自然语言处理
Lower-and-middle income countries are faced with challenges arising from a lack of data on cause of death (COD), which can limit decisions on population health and disease management. A verbal autopsy(VA) can provide information about a COD in areas without robust death registration systems. A VA consists of structured data, combining numeric and binary features, and unstructured data as part of an open-ended narrative text. This study assesses the performance of various machine learning approaches when analyzing both the structured and unstructured components of the VA report. The algorithms were trained and tested via cross-validation in the three settings of binary features, text features and a combination of binary and text features derived from VA reports from rural South Africa. The results obtained indicate narrative text features contain valuable information for determining COD and that a combination of binary and text features improves the automated COD classification task. Keywords: Diabetes Mellitus, Verbal Autopsy, Cause of Death, Machine Learning, Natural Language Processing