关于移动机器人导航的奖励成型：强化学习和基于大满贯的方法

论文标题

关于移动机器人导航的奖励成型：强化学习和基于大满贯的方法

On Reward Shaping for Mobile Robot Navigation: A Reinforcement Learning and SLAM Based Approach

论文作者

Botteghi, Nicolò, Sirmacek, Beril, Mustafa, Khaled A. A., Poel, Mannes, Stramigioli, Stefano

论文摘要

我们为在未知环境中导航的移动机器人基于深入加固学习（DRL）提供了一种无地图路径计划算法，该算法仅依赖于40维的RAW激光数据和探测信息。使用基于训练环境地图的在线知识（使用基于网格的Rao-Blackwellized粒子过滤器获得的奖励功能）对计划者进行了训练，以提高代理商的障碍意识。该代理在复杂的模拟环境中进行训练，并在两个看不见的环境中进行评估。我们表明，使用引入的奖励功能训练的政策不仅优于融合速度的标准奖励功能，还减少了36.9％的迭代步骤，并减少了碰撞样本，而且还大大提高了在不看到环境中的代理商的行为，在简化的工作空间中，在一个简单的工作空间和45 plife and 45 \％中，它均增加了23 \％。此外，在模拟环境中训练的政策可以直接转移到真正的机器人。可以在以下网址找到我们实验的视频：https：//youtu.be/uev7w6e6zqi

We present a map-less path planning algorithm based on Deep Reinforcement Learning (DRL) for mobile robots navigating in unknown environment that only relies on 40-dimensional raw laser data and odometry information. The planner is trained using a reward function shaped based on the online knowledge of the map of the training environment, obtained using grid-based Rao-Blackwellized particle filter, in an attempt to enhance the obstacle awareness of the agent. The agent is trained in a complex simulated environment and evaluated in two unseen ones. We show that the policy trained using the introduced reward function not only outperforms standard reward functions in terms of convergence speed, by a reduction of 36.9\% of the iteration steps, and reduction of the collision samples, but it also drastically improves the behaviour of the agent in unseen environments, respectively by 23\% in a simpler workspace and by 45\% in a more clustered one. Furthermore, the policy trained in the simulation environment can be directly and successfully transferred to the real robot. A video of our experiments can be found at: https://youtu.be/UEV7W6e6ZqI

下载PDF全文

下载文献需遵守相关版权规定

论文标题