自主船的时空复发增强学习

论文标题

自主船的时空复发增强学习

Spatial-temporal recurrent reinforcement learning for autonomous ships

论文作者

Waltz, Martin, Okhrin, Ostap

论文摘要

本文提出了一个空间循环的神经网络架构，用于深层$ q $ - 网络，可用于引导自动船。网络设计使得可以处理任意数量的周围目标船，同时为部分可观察性提供鲁棒性。此外，提出了最先进的碰撞风险度量标准，以使代理商更容易评估不同情况。在奖励功能的设计中，明确考虑了海上流量的COLREG规则。最终策略将在一组定制的新创建的单人相遇中验证，称为“时钟”问题和常用的iMazu（1987）问题，其中包括18个多企业方案。与人造潜在领域和速度障碍方法的性能比较证明了建议方法对海上路径计划的潜力。此外，新的体系结构在多代理方案中部署时表现出鲁棒性，并且与其他深层增强学习算法（包括参与者批评框架）兼容。

This paper proposes a spatial-temporal recurrent neural network architecture for deep $Q$-networks that can be used to steer an autonomous ship. The network design makes it possible to handle an arbitrary number of surrounding target ships while offering robustness to partial observability. Furthermore, a state-of-the-art collision risk metric is proposed to enable an easier assessment of different situations by the agent. The COLREG rules of maritime traffic are explicitly considered in the design of the reward function. The final policy is validated on a custom set of newly created single-ship encounters called `Around the Clock' problems and the commonly used Imazu (1987) problems, which include 18 multi-ship scenarios. Performance comparisons with artificial potential field and velocity obstacle methods demonstrate the potential of the proposed approach for maritime path planning. Furthermore, the new architecture exhibits robustness when it is deployed in multi-agent scenarios and it is compatible with other deep reinforcement learning algorithms, including actor-critic frameworks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题