论文标题
两跳合作中继网络中的继电器选择和功率优化的分层增强学习
Hierarchical Reinforcement Learning for Relay Selection and Power Optimization in Two-Hop Cooperative Relay Network
论文作者
论文摘要
合作交流是改善频谱利用率的有效方法。为了降低通信系统的中断概率,大多数研究提出了基于渠道状态信息(CSI)的假设(CSI)的各种中继选择和功率分配方案。但是,实践中很难获得准确的CSI。在本文中,我们研究了在两跳合作中继网络中受到总传输功率约束的最小化问题。我们使用强化学习(RL)方法来学习继电器选择和权力分配的策略,这些策略不需要任何对CSI的先验知识,而只是依靠与通信环境的互动。值得注意的是,当搜索空间太大时,传统的RL方法(包括大多数深钢筋学习(DRL)方法)无法表现良好。因此,我们首先提出了具有基于中断的奖励功能的DRL框架,然后将其用作基线。然后,我们进一步提出了分层增强学习(HRL)框架和培训算法。与现有文献中其他基于RL的方法的关键区别在于,我们提出的HRL方法将继电器选择和功率分配分解为两个分层优化目标,这些目标是在不同级别的培训。通过简化搜索空间,HRL方法可以解决稀疏奖励的问题,而常规的RL方法失败。仿真结果表明,与传统的DRL方法相比,HRL训练算法可以早些时候达到收敛30训练,并在具有相同的停电阈值的两跳继电器网络中将中断概率降低了5%。
Cooperative communication is an effective approach to improve spectrum utilization. In order to reduce outage probability of communication system, most studies propose various schemes for relay selection and power allocation, which are based on the assumption of channel state information (CSI). However, it is difficult to get an accurate CSI in practice. In this paper, we study the outage probability minimizing problem subjected to a total transmission power constraint in a two-hop cooperative relay network. We use reinforcement learning (RL) methods to learn strategies for relay selection and power allocation, which do not need any prior knowledge of CSI but simply rely on the interaction with communication environment. It is noted that conventional RL methods, including most deep reinforcement learning (DRL) methods, cannot perform well when the search space is too large. Therefore, we first propose a DRL framework with an outage-based reward function, which is then used as a baseline. Then, we further propose a hierarchical reinforcement learning (HRL) framework and training algorithm. A key difference from other RL-based methods in existing literatures is that, our proposed HRL approach decomposes relay selection and power allocation into two hierarchical optimization objectives, which are trained in different levels. With the simplification of search space, the HRL approach can solve the problem of sparse reward, while the conventional RL method fails. Simulation results reveal that compared with traditional DRL method, the HRL training algorithm can reach convergence 30 training iterations earlier and reduce the outage probability by 5% in two-hop relay network with the same outage threshold.