论文标题
预测结核病患者的治疗依从性
Predicting Treatment Adherence of Tuberculosis Patients at Scale
论文作者
论文摘要
结核病(TB)是一种传染性细菌疾病,是死亡的重要原因,尤其是在低收入国家,估计全球范围为2020美元,估计有1000万个新病例。尽管结核病是可以治疗的,但对药物治疗方案的不遵守是发病率和死亡率的重要原因。因此,积极地识别有掉落药物治疗方案的患者可以采取纠正措施来减轻不良后果。我们使用来自印度四个州的近四个州的近700,000美元患者的近距离衡量标准,我们根据基于自定义等级的度量制定并解决了机器学习(ML)的早期预测问题。我们训练ML型号并针对基准进行评估,同时考虑到全国范围内的大规模的未来部署,就可以超过规则的基准获得$ \ sim 100 \%$提升,而$ \ sim 214 \%$在随机分类器上。我们处理此过程中的各种问题,包括数据质量,高心电图分类数据,较低的目标流行率,分配转移,跨人群变化,算法公平性以及对鲁棒性和解释性的需求。我们的发现表明,非依从性患者的风险分层是可行的,可部署的ML解决方案。作为印度中央结核病部门的官方AI合作伙伴,我们正在多个城市和州级飞行员工作,目的是泛印度部署。
Tuberculosis (TB), an infectious bacterial disease, is a significant cause of death, especially in low-income countries, with an estimated ten million new cases reported globally in $2020$. While TB is treatable, non-adherence to the medication regimen is a significant cause of morbidity and mortality. Thus, proactively identifying patients at risk of dropping off their medication regimen enables corrective measures to mitigate adverse outcomes. Using a proxy measure of extreme non-adherence and a dataset of nearly $700,000$ patients from four states in India, we formulate and solve the machine learning (ML) problem of early prediction of non-adherence based on a custom rank-based metric. We train ML models and evaluate against baselines, achieving a $\sim 100\%$ lift over rule-based baselines and $\sim 214\%$ over a random classifier, taking into account country-wide large-scale future deployment. We deal with various issues in the process, including data quality, high-cardinality categorical data, low target prevalence, distribution shift, variation across cohorts, algorithmic fairness, and the need for robustness and explainability. Our findings indicate that risk stratification of non-adherent patients is a viable, deployable-at-scale ML solution. As the official AI partner of India's Central TB Division, we are working on multiple city and state-level pilots with the goal of pan-India deployment.