强大字母的增强学习评估和解决方案的解决方案

论文标题

强大字母的增强学习评估和解决方案的解决方案

Reinforcement Learning Evaluation and Solution for the Feedback Capacity of the Ising Channel with Large Alphabet

论文作者

Aharoni, Ziv, Sabag, Oron, Permuter, Haim Henri

论文摘要

我们提出了一种新方法，以使用增强学习（RL）的内存来计算Unifilar有限状态通道（FSC）的反馈能力。先前使用其公式作为马尔可夫决策过程（MDP）估算了反馈能力，并具有动态编程（DP）算法。但是，它们的计算复杂性随通道字母大小而成倍增长。因此，我们使用RL，特别是其具有神经网络的价值函数和策略的参数能力，以数值评估具有较大字母大小的通道的反馈能力。 RL算法的结果是反馈容量的数值下限，用于揭示最佳解决方案的结构。该结构是由基于图的辅助随机变量建模的，该变量可用于与双重性结合的反馈容量来得出分析上限。通过测试BCJR是否不变来验证上限的紧密度来结束容量计算。我们在具有任意字母大小的ISING通道上演示了此方法。对于小于或等于8的字母大小，我们得出容量的分析解。接下来，使用数值解决方案的结构来推导一种简单的编码方案，该方案可实现反馈能力并用作较大字母的下限。对于大于8的字母大小，我们在反馈容量上呈现上限。对于渐近的字母大小，我们提出了渐近的最佳编码方案。

We propose a new method to compute the feedback capacity of unifilar finite state channels (FSCs) with memory using reinforcement learning (RL). The feedback capacity was previously estimated using its formulation as a Markov decision process (MDP) with dynamic programming (DP) algorithms. However, their computational complexity grows exponentially with the channel alphabet size. Therefore, we use RL, and specifically its ability to parameterize value functions and policies with neural networks, to evaluate numerically the feedback capacity of channels with a large alphabet size. The outcome of the RL algorithm is a numerical lower bound on the feedback capacity, which is used to reveal the structure of the optimal solution. The structure is modeled by a graph-based auxiliary random variable that is utilized to derive an analytic upper bound on the feedback capacity with the duality bound. The capacity computation is concluded by verifying the tightness of the upper bound by testing whether it is BCJR invariant. We demonstrate this method on the Ising channel with an arbitrary alphabet size. For an alphabet size smaller than or equal to 8, we derive the analytic solution of the capacity. Next, the structure of the numerical solution is used to deduce a simple coding scheme that achieves the feedback capacity and serves as a lower bound for larger alphabets. For an alphabet size greater than 8, we present an upper bound on the feedback capacity. For an asymptotically large alphabet size, we present an asymptotic optimal coding scheme.

下载PDF全文

下载文献需遵守相关版权规定

论文标题