时间逻辑控制目标的加速加强学习

论文标题

时间逻辑控制目标的加速加强学习

Accelerated Reinforcement Learning for Temporal Logic Control Objectives

论文作者

Kantaros, Yiannis

论文摘要

本文解决了以未知的马尔可夫决策过程（MDP）建模的移动机器人的学习控制策略的问题，该问题负责为时间逻辑任务，例如测序，覆盖或监视。 MDP捕获工作空间结构的不确定性和控制决策的结果。控制目标是合成一个控制策略，该策略最大化完成高级任务的可能性，该任务指定为线性时间逻辑（LTL）公式。为了解决这个问题，我们为LTL控制目标提出了一种新型的基于模型的加速增强学习（RL）算法，该算法能够比相关方法更快地学习控制策略。它的样本效率依赖于偏见探索可能导致任务满意度的方向。这是通过利用LTL任务的自动机表示以及连续学习的MDP模型来完成的。最后，我们提供了比较实验，这些实验证明了针对LTL目标的最新RL方法的样本效率。

This paper addresses the problem of learning control policies for mobile robots, modeled as unknown Markov Decision Processes (MDPs), that are tasked with temporal logic missions, such as sequencing, coverage, or surveillance. The MDP captures uncertainty in the workspace structure and the outcomes of control decisions. The control objective is to synthesize a control policy that maximizes the probability of accomplishing a high-level task, specified as a Linear Temporal Logic (LTL) formula. To address this problem, we propose a novel accelerated model-based reinforcement learning (RL) algorithm for LTL control objectives that is capable of learning control policies significantly faster than related approaches. Its sample-efficiency relies on biasing exploration towards directions that may contribute to task satisfaction. This is accomplished by leveraging an automaton representation of the LTL task as well as a continuously learned MDP model. Finally, we provide comparative experiments that demonstrate the sample efficiency of the proposed method against recent RL methods for LTL objectives.

下载PDF全文

下载文献需遵守相关版权规定

论文标题