探索使用强化学习的多目标covid-19缓解政策的帕累托前部

论文标题

探索使用强化学习的多目标covid-19缓解政策的帕累托前部

Exploring the Pareto front of multi-objective COVID-19 mitigation policies using reinforcement learning

论文作者

Reymond, Mathieu, Hayes, Conor F., Willem, Lander, Rădulescu, Roxana, Abrams, Steven, Roijers, Diederik M., Howley, Enda, Mannion, Patrick, Hens, Niel, Nowé, Ann, Libin, Pieter

论文摘要

传染病暴发可能会对公共卫生和社会过程产生破坏性影响。由于在流行病的背景下进行决策很难，因此增强学习提供了一种方法，可以自动学习预防策略，并结合复杂的流行病模型。当前的研究重点是优化政策W.R.T.一个目标，例如病原体的攻击率。但是，由于缓解流行病的涉及不同的标准（I.A.，患病率，死亡率，发病率，成本），因此有必要采用多目标方法来学习平衡的政策。为了将这一决策过程提升到现实世界中的流行模型，我们采用了深层的多目标增强学习，并以最先进的算法，帕累托条件网络（PCN）为基础，以学习一套解决决策问题的解决方案。我们考虑了比利时Covid-19的第一波流行病的第一波，该流行病是通过封锁来减轻的，并研究了不同的解次策略，旨在最大程度地减少Covid-19病例（即感染和住院）和社会负担，这是由应用缓解措施引起的。我们贡献了一个多目标马尔可夫决策过程，该过程封装了随机隔间模型，该模型用于在Covid-19-19期间为政策制定者提供信息。由于这些社会缓解措施是在调节年龄结构流行模型的接触矩阵的连续动作空间中实施的，因此我们将PCN扩展到了此设置。我们评估了PCN返回的解决方案，并观察到，每当住院率足够低时，它才能正确学会减轻社会负担。因此，在这项工作中，我们表明，在复杂的流行病学模型中可以实现多目标增强学习，并提供了平衡复杂缓解政策的基本见解。

Infectious disease outbreaks can have a disruptive impact on public health and societal processes. As decision making in the context of epidemic mitigation is hard, reinforcement learning provides a methodology to automatically learn prevention strategies in combination with complex epidemic models. Current research focuses on optimizing policies w.r.t. a single objective, such as the pathogen's attack rate. However, as the mitigation of epidemics involves distinct, and possibly conflicting criteria (i.a., prevalence, mortality, morbidity, cost), a multi-objective approach is warranted to learn balanced policies. To lift this decision-making process to real-world epidemic models, we apply deep multi-objective reinforcement learning and build upon a state-of-the-art algorithm, Pareto Conditioned Networks (PCN), to learn a set of solutions that approximates the Pareto front of the decision problem. We consider the first wave of the Belgian COVID-19 epidemic, which was mitigated by a lockdown, and study different deconfinement strategies, aiming to minimize both COVID-19 cases (i.e., infections and hospitalizations) and the societal burden that is induced by the applied mitigation measures. We contribute a multi-objective Markov decision process that encapsulates the stochastic compartment model that was used to inform policy makers during the COVID-19 epidemic. As these social mitigation measures are implemented in a continuous action space that modulates the contact matrix of the age-structured epidemic model, we extend PCN to this setting. We evaluate the solution returned by PCN, and observe that it correctly learns to reduce the social burden whenever the hospitalization rates are sufficiently low. In this work, we thus show that multi-objective reinforcement learning is attainable in complex epidemiological models and provides essential insights to balance complex mitigation policies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题