IRS的深度加固学习空间相关环境中的相移设计

论文标题

IRS的深度加固学习空间相关环境中的相移设计

Deep Reinforcement Learning for IRS Phase Shift Design in Spatiotemporally Correlated Environments

论文作者

Evmorfos, Spilios, Petropulu, Athina P., Poor, H. Vincent

论文摘要

本文研究了在时空相关的通道环境中设计智能反射表面（IRS）相变的智能反射表面（IRS）相位变速器的问题，目的地可以在限制区域内移动。目的是在无限时间范围内最大化接收器处的SNR的预期总和。问题表达产生了马尔可夫决策过程（MDP）。我们提出了一种深入的参与者批判算法，该算法通过构造状态表示来包括接收器的当前位置以及与先前时间步长的窗口相对应的相对的相位移位值和接收器位置来说明通道相关性和目标运动。通道变异性在基础值函数的频谱上诱导高频组件。我们建议使用傅立叶内核对评论家的投入进行预处理，从而实现稳定的价值学习。最后，我们研究了目标SNR作为设计的MDP状态的组成部分，这是先前工作中的常见实践。我们提供的经验证据表明，当通道时空相关时，将SNR纳入状态表示会与函数近似相互作用，以抑制收敛性的方式相互作用。

The paper studies the problem of designing the Intelligent Reflecting Surface (IRS) phase shifters for Multiple Input Single Output (MISO) communication systems in spatiotemporally correlated channel environments, where the destination can move within a confined area. The objective is to maximize the expected sum of SNRs at the receiver over infinite time horizons. The problem formulation gives rise to a Markov Decision Process (MDP). We propose a deep actor-critic algorithm that accounts for channel correlations and destination motion by constructing the state representation to include the current position of the receiver and the phase shift values and receiver positions that correspond to a window of previous time steps. The channel variability induces high frequency components on the spectrum of the underlying value function. We propose the preprocessing of the critic's input with a Fourier kernel which enables stable value learning. Finally, we investigate the use of the destination SNR as a component of the designed MDP state, which is common practice in previous work. We provide empirical evidence that, when the channels are spatiotemporally correlated, the inclusion of the SNR in the state representation interacts with function approximation in ways that inhibit convergence.

下载PDF全文

下载文献需遵守相关版权规定

论文标题