论文标题

使用深度学习时间序列模型来解释县级Covid-19感染和特征敏感性

Interpreting County Level COVID-19 Infection and Feature Sensitivity using Deep Learning Time Series Models

论文作者

Islam, Md Khairul, Zhu, Di, Liu, Yingzheng, Erkelens, Andrej, Daniello, Nick, Fox, Judy

论文摘要

可解释的机器学习在医疗保健中起着关键作用,因为它在理解深度学习模型预测中的特征重要性方面具有挑战性。我们提出了一个新颖的框架,该框架使用深度学习来研究模型预测的特征敏感性。这项工作将灵敏度分析与异构时间序列深度学习模型预测相结合,这与时空特征的解释相对应。我们使用时间融合变压器预测县级Covid-19感染。然后,我们使用敏感性分析扩展了莫里斯方法,以查看输出相对于我们对静态和动态输入特征的敏感性。这项工作的重要性基于现实世界中的COVID-19感染预测,具有高度非平稳的,细粒的和异质的数据。 1)与Pytorch基线相比,我们的模型可以捕获时间和空间模型行为的详细每日变化,并实现高预测性能。 2)通过分析莫里斯的灵敏度指数和注意力模式,我们通过观察群体和动态模型变化破译了特征重要性的含义。 3)我们已经收集了3142个美国县的社会经济和健康特征2.5岁,例如观察到的病例和死亡,以及许多静态(年龄分布,健康差异和行业)以及动态特征(疫苗接种,疾病传播,可传播的病例和社会疏远)。使用拟议的框架,我们进行了广泛的实验,并表明我们的模型可以学习复杂的相互作用,并在县一级进行日常感染的预测。能够通过在县级使用莫里斯指数进行混合预测和描述的准确度测量来对疾病感染进行建模,这是一个核心思想,它通过灵敏度分析阐明了个人特征解释。

Interpretable machine learning plays a key role in healthcare because it is challenging in understanding feature importance in deep learning model predictions. We propose a novel framework that uses deep learning to study feature sensitivity for model predictions. This work combines sensitivity analysis with heterogeneous time-series deep learning model prediction, which corresponds to the interpretations of spatio-temporal features. We forecast county-level COVID-19 infection using the Temporal Fusion Transformer. We then use the sensitivity analysis extending Morris Method to see how sensitive the outputs are with respect to perturbation to our static and dynamic input features. The significance of the work is grounded in a real-world COVID-19 infection prediction with highly non-stationary, finely granular, and heterogeneous data. 1) Our model can capture the detailed daily changes of temporal and spatial model behaviors and achieves high prediction performance compared to a PyTorch baseline. 2) By analyzing the Morris sensitivity indices and attention patterns, we decipher the meaning of feature importance with observational population and dynamic model changes. 3) We have collected 2.5 years of socioeconomic and health features over 3142 US counties, such as observed cases and deaths, and a number of static (age distribution, health disparity, and industry) and dynamic features (vaccination, disease spread, transmissible cases, and social distancing). Using the proposed framework, we conduct extensive experiments and show our model can learn complex interactions and perform predictions for daily infection at the county level. Being able to model the disease infection with a hybrid prediction and description accuracy measurement with Morris index at the county level is a central idea that sheds light on individual feature interpretation via sensitivity analysis.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源