无线无人机网络中的轨迹设计的分布式多代理元学习

论文标题

无线无人机网络中的轨迹设计的分布式多代理元学习

Distributed Multi-agent Meta Learning for Trajectory Design in Wireless Drone Networks

论文作者

Hu, Ye, Chen, Mingzhe, Saad, Walid, Poor, H. Vincent, Cui, Shuguang

论文摘要

在本文中，研究了在动态无线网络环境中运行的一组能量约束无人机的轨迹设计问题。在考虑的模型中，一组无人机基站（DBSS）的团队被派去为具有动态且无法预测的上行链路访问需求的地面用户的群集提供服务。在这种情况下，DBSS必须在考虑的区域中进行合作导航，以最大程度地覆盖地面用户的动态请求。这个轨迹设计问题被提出为优化框架，其目标是找到最大化所有DBS用户的最佳轨迹。为了在不可预测的环境下找到针对此非凸优化问题的最佳解决方案，提出了基于价值分解的增强增强学习（VDRL）解决方案以及元训练机制。该算法使DBSS可以动态学习其轨迹，同时将其学习为看不见的环境。分析结果表明，所提出的VD-RL算法保证可以收敛到非凸优化问题的局部最佳解决方案。仿真结果表明，与基线多代理算法相比，即使没有元训练，提出的VD-RL算法也可以提高服务覆盖率53.2％，而收敛速度提高了30.6％。同时，当DBSS必须处理以前看不见的任务时，使用元学习的使用将VD-RL算法的收敛速度提高了53.8％。

In this paper, the problem of the trajectory design for a group of energy-constrained drones operating in dynamic wireless network environments is studied. In the considered model, a team of drone base stations (DBSs) is dispatched to cooperatively serve clusters of ground users that have dynamic and unpredictable uplink access demands. In this scenario, the DBSs must cooperatively navigate in the considered area to maximize coverage of the dynamic requests of the ground users. This trajectory design problem is posed as an optimization framework whose goal is to find optimal trajectories that maximize the fraction of users served by all DBSs. To find an optimal solution for this non-convex optimization problem under unpredictable environments, a value decomposition based reinforcement learning (VDRL) solution coupled with a meta-training mechanism is proposed. This algorithm allows the DBSs to dynamically learn their trajectories while generalizing their learning to unseen environments. Analytical results show that, the proposed VD-RL algorithm is guaranteed to converge to a local optimal solution of the non-convex optimization problem. Simulation results show that, even without meta-training, the proposed VD-RL algorithm can achieve a 53.2% improvement of the service coverage and a 30.6% improvement in terms of the convergence speed, compared to baseline multi-agent algorithms. Meanwhile, the use of meta-learning improves the convergence speed of the VD-RL algorithm by up to 53.8% when the DBSs must deal with a previously unseen task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题