具有线性时间逻辑目标的随机游戏的无模型增强学习

论文标题

具有线性时间逻辑目标的随机游戏的无模型增强学习

Model-Free Reinforcement Learning for Stochastic Games with Linear Temporal Logic Objectives

论文作者

Bozkurt, Alper Kamil, Wang, Yu, Zavlanos, Michael, Pajic, Miroslav

论文摘要

我们研究未知环境中线性时间逻辑（LTL）目标的控制策略的问题。我们将此问题建模为基于转弯的零和随机游戏，在控制器和环境之间，在该环境中，过渡概率和模型拓扑是完全未知的。该游戏中控制器的获胜条件是给定LTL规范的满意度，可以通过直接从LTL规范得出的确定性Rabin Automaton（DRA）的接受条件来捕获。我们介绍了一种无模型的增强学习（RL）方法，以找到一种策略，该策略最大程度地提高了满足给定LTL规范的可能性，而派生的DRA的Rabin条件具有单个接受对。然后，我们将这种方法概括为LTL公式，Rabin条件具有更多的接受对，从而提供了满意度概率的下限。最后，我们说明了我们的RL方法在两个运动计划案例研究中的适用性。

We study the problem of synthesizing control strategies for Linear Temporal Logic (LTL) objectives in unknown environments. We model this problem as a turn-based zero-sum stochastic game between the controller and the environment, where the transition probabilities and the model topology are fully unknown. The winning condition for the controller in this game is the satisfaction of the given LTL specification, which can be captured by the acceptance condition of a deterministic Rabin automaton (DRA) directly derived from the LTL specification. We introduce a model-free reinforcement learning (RL) methodology to find a strategy that maximizes the probability of satisfying a given LTL specification when the Rabin condition of the derived DRA has a single accepting pair. We then generalize this approach to LTL formulas for which the Rabin condition has a larger number of accepting pairs, providing a lower bound on the satisfaction probability. Finally, we illustrate applicability of our RL method on two motion planning case studies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题