具有有限的线性时间逻辑约束的部分可观察到的马尔可夫决策过程的最佳控制

论文标题

具有有限的线性时间逻辑约束的部分可观察到的马尔可夫决策过程的最佳控制

Optimal Control of Partially Observable Markov Decision Processes with Finite Linear Temporal Logic Constraints

论文作者

Kalagarla, Krishna C., Kartik, Dhruva, Shen, Dongming, Jain, Rahul, Nayyar, Ashutosh, Nuzzo, Pierluigi

论文摘要

自主代理通常在部分观察到国家的情况下运行。除了最大程度地提高其累积奖励外，代理还必须使用丰富的时间和逻辑结构执行复杂的任务。这些任务可以使用时间逻辑语言（如有限线性时间逻辑（LTL_F））表示。本文首次提供了一个结构化框架，用于设计代理策略，以最大程度地提高奖励，同时确保满足时间逻辑规范的可能性足够高。我们将问题重新制定为受约束的部分可观察到的马尔可夫决策过程（POMDP），并提供了一种新颖的方法，可以利用现成的无约束的POMDP求解器来解决它。我们的方法保证了与高概率的近似最优性和约束满意度。我们通过在几种感兴趣的模型上实施它来证明其有效性。

Autonomous agents often operate in scenarios where the state is partially observed. In addition to maximizing their cumulative reward, agents must execute complex tasks with rich temporal and logical structures. These tasks can be expressed using temporal logic languages like finite linear temporal logic (LTL_f). This paper, for the first time, provides a structured framework for designing agent policies that maximize the reward while ensuring that the probability of satisfying the temporal logic specification is sufficiently high. We reformulate the problem as a constrained partially observable Markov decision process (POMDP) and provide a novel approach that can leverage off-the-shelf unconstrained POMDP solvers for solving it. Our approach guarantees approximate optimality and constraint satisfaction with high probability. We demonstrate its effectiveness by implementing it on several models of interest.

下载PDF全文

下载文献需遵守相关版权规定

论文标题