论文标题
通过无执照的带宽作为上下文多人多军强盗游戏,在IoT网络中分配频道分配的分散学习渠道分配学习
Decentralized Learning for Channel Allocation in IoT Networks over Unlicensed Bandwidth as a Contextual Multi-player Multi-armed Bandit Game
论文作者
论文摘要
我们研究了临时互联网网络中的分散通道分配问题,该网络是在许可给主要蜂窝网络的频谱上的。在所考虑的网络中,IoT设备上贫困的通道感应/探测能力和计算资源使它们难以获取共享多个渠道的详细通道状态信息(CSI)。实际上,主要用户的传输活动的未知模式和随时间变化的CSI(例如,由于小规模褪色或设备移动性)也会导致通道质量的随机变化。因此,有望根据部分观察结果在线学习渠道条件,同时未能获得有关其未运行的渠道的信息。他们还必须达到有限的协调能力分配通道分配的有效的,无冲突的解。我们的研究将此问题映射到上下文多人,多军匪徒游戏中,并通过反复试验提出了一种纯粹的分散的三阶段政策学习算法。理论分析表明,所提出的方案保证了物联网链接以共同融合到社会最佳渠道分配,并以亚线性(即,polyogarithmit)对操作时间感到遗憾。模拟表明,与其他最先进的分散匪徒算法相比,它在效率和网络可伸缩性之间达到了良好的平衡。
We study a decentralized channel allocation problem in an ad-hoc Internet of Things network underlaying on the spectrum licensed to a primary cellular network. In the considered network, the impoverished channel sensing/probing capability and computational resource on the IoT devices make them difficult to acquire the detailed Channel State Information (CSI) for the shared multiple channels. In practice, the unknown patterns of the primary users' transmission activities and the time-varying CSI (e.g., due to small-scale fading or device mobility) also cause stochastic changes in the channel quality. Decentralized IoT links are thus expected to learn channel conditions online based on partial observations, while acquiring no information about the channels that they are not operating on. They also have to reach an efficient, collision-free solution of channel allocation with limited coordination. Our study maps this problem into a contextual multi-player, multi-armed bandit game, and proposes a purely decentralized, three-stage policy learning algorithm through trial-and-error. Theoretical analyses shows that the proposed scheme guarantees the IoT links to jointly converge to the social optimal channel allocation with a sub-linear (i.e., polylogarithmic) regret with respect to the operational time. Simulations demonstrate that it strikes a good balance between efficiency and network scalability when compared with the other state-of-the-art decentralized bandit algorithms.