在稀疏的奖励增强学习中利用变压器，以进行可解释的时间逻辑运动计划

论文标题

在稀疏的奖励增强学习中利用变压器，以进行可解释的时间逻辑运动计划

Exploiting Transformer in Sparse Reward Reinforcement Learning for Interpretable Temporal Logic Motion Planning

论文作者

Zhang, Hao, Wang, Hao, Kan, Zhen

论文摘要

基于自动机的方法使机器人能够执行各种复杂的任务。但是，大多数现有的基于自动机的算法高度依赖于已考虑任务的状态的手动定制表示，从而限制了其在深度强化学习算法中的适用性。为了解决这个问题，通过将变压器纳入强化学习中，我们开发了一个双转化器引导的时间逻辑框架（T2TL），该框架（T2TL）两次利用变压器的结构特征，即，首先通过Transficeer模块编码LTL指令，以通过训练和通过上下文进行构图通过Transforce varronalser进行构图，以改进任务范围，以便通过训练和编码任务的有效理解。特别是，LTL指令由Co-Safe LTL指定。作为具有语义性的重写操作，LTL的进步被利用以将复杂的任务分解为可学习的子目标，这不仅将非马克维亚奖励决策过程转换为马尔可夫的奖励决策过程，而且通过同时学习多个子任务来提高采样效率。进一步纳入了环境不足的LTL预训练方案，以促进变压器模块的学习，从而改善LTL的表示。模拟结果证明了T2TL框架的有效性。

Automaton based approaches have enabled robots to perform various complex tasks. However, most existing automaton based algorithms highly rely on the manually customized representation of states for the considered task, limiting its applicability in deep reinforcement learning algorithms. To address this issue, by incorporating Transformer into reinforcement learning, we develop a Double-Transformer-guided Temporal Logic framework (T2TL) that exploits the structural feature of Transformer twice, i.e., first encoding the LTL instruction via the Transformer module for efficient understanding of task instructions during the training and then encoding the context variable via the Transformer again for improved task performance. Particularly, the LTL instruction is specified by co-safe LTL. As a semantics-preserving rewriting operation, LTL progression is exploited to decompose the complex task into learnable sub-goals, which not only converts non-Markovian reward decision processes to Markovian ones, but also improves the sampling efficiency by simultaneous learning of multiple sub-tasks. An environment-agnostic LTL pre-training scheme is further incorporated to facilitate the learning of the Transformer module resulting in an improved representation of LTL. The simulation results demonstrate the effectiveness of the T2TL framework.

下载PDF全文

下载文献需遵守相关版权规定

论文标题