使用演示数据的基于无人用障碍的本地规划者进行深度强化学习

论文标题

使用演示数据的基于无人用障碍的本地规划者进行深度强化学习

Deep Reinforcement Learning based Local Planner for UAV Obstacle Avoidance using Demonstration Data

论文作者

He, Lei, Aouf, Nabil, Whidborne, James F., Song, Bifeng

论文摘要

在本文中，提出了一种深入的增强学习（DRL）方法，以解决在未知环境中无人机导航的问题。但是，DRL算法受数据效率问题的限制，因为它们通常需要大量数据才能达到合理的性能。为了加快DRL培训过程，我们开发了一个新颖的学习框架，该框架结合了模仿学习和强化学习和基于双延迟的DDPG（TD3）算法的建立。我们在模仿阶段使用专家演示来学习新介绍的政策和Q值网络。为了解决从模仿到增强学习的分布不匹配问题的转移，TD-ERROR和衰减的模仿损失均用于更新与环境交互的预训练网络。使用深度摄像机在具有挑战性的3D无人机导航问题上证明了所提出的算法的性能，并在各种仿真环境中进行了概述。

In this paper, a deep reinforcement learning (DRL) method is proposed to address the problem of UAV navigation in an unknown environment. However, DRL algorithms are limited by the data efficiency problem as they typically require a huge amount of data before they reach a reasonable performance. To speed up the DRL training process, we developed a novel learning framework which combines imitation learning and reinforcement learning and building upon Twin Delayed DDPG (TD3) algorithm. We newly introduced both policy and Q-value network are learned using the expert demonstration during the imitation phase. To tackle the distribution mismatch problem transfer from imitation to reinforcement learning, both TD-error and decayed imitation loss are used to update the pre-trained network when start interacting with the environment. The performances of the proposed algorithm are demonstrated on the challenging 3D UAV navigation problem using depth cameras and sketched in a variety of simulation environments.

下载PDF全文

下载文献需遵守相关版权规定

论文标题