通过加强学习在紧急服务系统中的最佳调度

论文标题

通过加强学习在紧急服务系统中的最佳调度

Optimal Dispatch in Emergency Service System via Reinforcement Learning

论文作者

Hua, Cheng, Zaman, Tauhid

论文摘要

在美国，消防部门在过去四十年中的医疗反应增加了367％。这使得对应急部门的决策者有效地使用现有资源至关重要。在本文中，我们将救护车调度问题建模为平均成本马尔可夫决策过程，并提出一种政策迭代方法，以找到最佳的调度策略。然后，我们提出了一种使用决策后状态的替代公式，该公式在数学上与原始模型相同，但状态空间较小。我们根据决策后的状态提出了针对调度问题的时间差异学习方法。在我们的数值实验中，我们表明我们获得的时间差异策略优于基准近视政策。我们的发现表明，应急部门可以以最小的成本提高其性能。

In the United States, medical responses by fire departments over the last four decades increased by 367%. This had made it critical to decision makers in emergency response departments that existing resources are efficiently used. In this paper, we model the ambulance dispatch problem as an average-cost Markov decision process and present a policy iteration approach to find an optimal dispatch policy. We then propose an alternative formulation using post-decision states that is shown to be mathematically equivalent to the original model, but with a much smaller state space. We present a temporal difference learning approach to the dispatch problem based on the post-decision states. In our numerical experiments, we show that our obtained temporal-difference policy outperforms the benchmark myopic policy. Our findings suggest that emergency response departments can improve their performance with minimal to no cost.

下载PDF全文

下载文献需遵守相关版权规定

论文标题