强化学习基于学习的联合用户调度和链接配置在毫米波网络中

论文标题

强化学习基于学习的联合用户调度和链接配置在毫米波网络中

Reinforcement Learning-based Joint User Scheduling and Link Configuration in Millimeter-wave Networks

论文作者

Zhang, Yi, Heath Jr, Robert W.

论文摘要

在本文中，我们开发了用于联合用户调度的算法和三种类型的MMWave链接配置：中继选择，代码簿优化和毫米波（MMWave）网络中的光束跟踪。我们的目标是设计一个在线控制器，该控制器动态安排用户并配置其链接以最大程度地减少系统延迟。为了解决这个复杂的调度问题，我们将其建模为动态决策过程，并开发两个基于增强学习的解决方案。第一个解决方案是基于深度强化学习（DRL），该学习利用近端政策优化来培训基于神经网络的解决方案。由于DRL的潜在样本复杂性，我们还提出了基于经验的多臂匪徒（MAB）的解决方案，该解决方案将决策过程分解为一系列子行动，并利用经典的Max Weakeight Scheduling和Thompson采样来决定这些子运动。我们对拟议解决方案的评估证实了它们在提供可接受的系统延迟方面的有效性。这还表明基于DRL的解决方案具有更好的延迟性能，而基于mAB的解决方案具有更快的训练过程。

In this paper, we develop algorithms for joint user scheduling and three types of mmWave link configuration: relay selection, codebook optimization, and beam tracking in millimeter wave (mmWave) networks. Our goal is to design an online controller that dynamically schedules users and configures their links to minimize the system delay. To solve this complex scheduling problem, we model it as a dynamic decision-making process and develop two reinforcement learning-based solutions. The first solution is based on deep reinforcement learning (DRL), which leverages the proximal policy optimization to train a neural network-based solution. Due to the potential high sample complexity of DRL, we also propose an empirical multi-armed bandit (MAB)-based solution, which decomposes the decision-making process into a sequential of sub-actions and exploits classic maxweight scheduling and Thompson sampling to decide those sub-actions. Our evaluation of the proposed solutions confirms their effectiveness in providing acceptable system delay. It also shows that the DRL-based solution has better delay performance while the MAB-based solution has a faster training process.

下载PDF全文

下载文献需遵守相关版权规定

论文标题