迈向下一代无线网络的安全和加速的深度强化学习

论文标题

迈向下一代无线网络的安全和加速的深度强化学习

Toward Safe and Accelerated Deep Reinforcement Learning for Next-Generation Wireless Networks

论文作者

Nagib, Ahmad M., Abou-zeid, Hatem, Hassanein, Hossam S.

论文摘要

深度加强学习（DRL）算法最近在无线网络领域引起了广泛的关注。它们被认为是解决下一代网络中动态无线电资源管理（RRM）问题的有前途的方法。鉴于它们可以构建无线网络环境的近似且不断更新模型的功能，DRL算法可以处理此类环境的多方面复杂性。然而，一些挑战阻碍了商业网络中DRL的实际采用。在本文中，我们首先讨论了两个关键的实践挑战，这些挑战在开发基于DRL的RRM解决方案时很少解决。我们认为，不可避免地要解决这些与DRL相关的挑战，以使其进入RRM商业解决方案。特别是，我们讨论了具有安全和加速的基于DRL的RRM解决方案的需求，以减轻DRL算法表现出的缓慢收敛性和性能不稳定性。然后，我们审查并分类RRM域中使用的主要方法，以开发安全且基于DRL的解决方案。最后，进行了案例研究，以证明具有安全和加速的基于DRL的RRM解决方案的重要性。我们采用多种转移学习（TL）技术的变体来加速基于智能无线电访问网络（RAN）基于DRL的控制器的收敛。我们还提出了一种基于混合TL的方法和基于Sigmoid函数的奖励，作为基于DRL的SAFE探索的示例。

Deep reinforcement learning (DRL) algorithms have recently gained wide attention in the wireless networks domain. They are considered promising approaches for solving dynamic radio resource management (RRM) problems in next-generation networks. Given their capabilities to build an approximate and continuously updated model of the wireless network environments, DRL algorithms can deal with the multifaceted complexity of such environments. Nevertheless, several challenges hinder the practical adoption of DRL in commercial networks. In this article, we first discuss two key practical challenges that are faced but rarely tackled when developing DRL-based RRM solutions. We argue that it is inevitable to address these DRL-related challenges for DRL to find its way to RRM commercial solutions. In particular, we discuss the need to have safe and accelerated DRL-based RRM solutions that mitigate the slow convergence and performance instability exhibited by DRL algorithms. We then review and categorize the main approaches used in the RRM domain to develop safe and accelerated DRL-based solutions. Finally, a case study is conducted to demonstrate the importance of having safe and accelerated DRL-based RRM solutions. We employ multiple variants of transfer learning (TL) techniques to accelerate the convergence of intelligent radio access network (RAN) slicing DRL-based controllers. We also propose a hybrid TL-based approach and sigmoid function-based rewards as examples of safe exploration in DRL-based RAN slicing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题