论文标题
一种培训多个合作社自主驾驶的新方法
A New Approach to Training Multiple Cooperative Agents for Autonomous Driving
论文作者
论文摘要
在自主驾驶的复杂情况下,培训多个代理商以执行安全和合作的控制是一个挑战。对于一小群汽车,本文提出了Lepus,这是一种培训多个代理商的新方法。 Lepus采用纯粹的合作方式来培训多个代理,以策略网络的共享参数和多个代理的共享奖励函数为特色。特别是,Lepus通过对抗过程预先培训政策网络,提高其协作决策能力并进一步促进汽车驾驶的稳定性。此外,由于减轻稀疏奖励的问题,Lepus通过结合随机网络和蒸馏网络从专家轨迹中学习了近似奖励功能。我们在Madras模拟平台上进行了广泛的实验。实验结果表明,在同时驾驶的同时驾驶时,多种训练的代理可以避免碰撞,并且在稳定性方面均超过其他四种方法,即DDPG-FDE,PSDDPG,MADDPG和Magail(DDPG)。
Training multiple agents to perform safe and cooperative control in the complex scenarios of autonomous driving has been a challenge. For a small fleet of cars moving together, this paper proposes Lepus, a new approach to training multiple agents. Lepus adopts a pure cooperative manner for training multiple agents, featured with the shared parameters of policy networks and the shared reward function of multiple agents. In particular, Lepus pre-trains the policy networks via an adversarial process, improving its collaborative decision-making capability and further the stability of car driving. Moreover, for alleviating the problem of sparse rewards, Lepus learns an approximate reward function from expert trajectories by combining a random network and a distillation network. We conduct extensive experiments on the MADRaS simulation platform. The experimental results show that multiple agents trained by Lepus can avoid collisions as many as possible while driving simultaneously and outperform the other four methods, that is, DDPG-FDE, PSDDPG, MADDPG, and MAGAIL(DDPG) in terms of stability.