论文标题

随机双控制的连续时间公式,以避免维度的诅咒

A continuous time formulation of stochastic dual control to avoid the curse of dimensionality

论文作者

Péron, Martin, Baker, Christopher M., Hughes, Barry D., Chadès, Iadine

论文摘要

双控制表示一类控制问题,其中管理系统的参数是不完美的。面临的挑战是找到探测之间的最佳平衡,即激发系统以更多地了解系统,并谨慎,即根据当前知识选择保守的控制,以实现控制目标。动态编程技术可以实现这一最佳权衡。但是,尽管动态编程在离散状态和时间方面表现良好,但它并不适合连续的时间范围或连续或无界状态空间的问题。另一个限制是,多维状态通常会导致动态编程方法棘手。在本文中,我们研究了连续的时间最佳控制工具是否可以帮助绕过这些警告,同时仍达到探测平衡。我们引入了一个风格化的问题,该问题由国家受到两个微分方程之一的约束。最初未知的差分方程控制系统,因此我们必须同时确定真实的微分方程并控制系统到所需状态。我们展示了如何转换此问题以应用最佳控制工具,并将这种方法的性能与动态编程方法进行比较。我们的结果表明,在小问题上,最佳控制算法可以竞争动态编程,从而在积极和平稳变化的控制措施之间达到了正确的平衡。与动态编程相反,当要同时控制几个状态时,最佳控制方法仍然可以处理。

Dual control denotes a class of control problems where the parameters governing the system are imperfectly known. The challenge is to find the optimal balance between probing, i.e. exciting the system to understand it more, and caution, i.e. selecting conservative controls based on current knowledge to achieve the control objective. Dynamic programming techniques can achieve this optimal trade-off. However, while dynamic programming performs well with discrete state and time, it is not well-suited to problems with continuous time-frames or continuous or unbounded state spaces. Another limitation is that multidimensional states often cause the dynamic programming approaches to be intractable. In this paper, we investigate whether continuous-time optimal control tools could help circumvent these caveats whilst still achieving the probing-caution balance. We introduce a stylized problem where the state is governed by one of two differential equations. It is initially unknown which differential equation governs the system, so we must simultaneously determine the true differential equation and control the system to the desired state. We show how this problem can be transformed to apply optimal control tools, and compare the performance of this approach to a dynamic programming approach. Our results suggest that the optimal control algorithm rivals dynamic programming on small problems, achieving the right balance between aggressive and smoothly varying controls. In contrast to dynamic programming, the optimal control approach remains tractable when several states are to be controlled simultaneously.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源