论文标题
深厚的强化学习基于学习的重新平衡政策,以获利接力节点中的接力节点网络中的中继节点
Deep Reinforcement Learning-based Rebalancing Policies for Profit Maximization of Relay Nodes in Payment Channel Networks
论文作者
论文摘要
支付通道网络(PCNS)是一个层层链可伸缩性解决方案,其主要实体付款通道,使节点对“离链”对之间的交易,从而减少了1层网络的负担。具有多个渠道的节点可以通过提供流动性并将部分付款金额作为费用来作为多主付款的继电器。继电器节点可能会在一段时间后使用一个或多个不平衡的通道,因此需要触发重新平衡操作。在本文中,我们研究中继节点如何通过使用潜艇交换的重新平衡方法从费用中最大化其利润。我们介绍了一个随机模型,以捕获观察随机交易到达并执行偶尔重新平衡操作的中继节点的动力学,并将系统演变表示为马尔可夫决策过程。我们在所有重新平衡政策上提出了节点随着时间的推移最大化的问题,并通过设计深入的增强学习(DRL)基于基于重新平衡的策略来近似最佳解决方案。我们构建了系统的离散事件模拟器,并通过对不同策略和参数化的比较研究在大多数条件下使用它来证明DRL策略的出色性能。我们的工作是第一个在复杂的PCN世界中引入DRL进行流动性管理的工作。
Payment channel networks (PCNs) are a layer-2 blockchain scalability solution, with its main entity, the payment channel, enabling transactions between pairs of nodes "off-chain," thus reducing the burden on the layer-1 network. Nodes with multiple channels can serve as relays for multihop payments by providing their liquidity and withholding part of the payment amount as a fee. Relay nodes might after a while end up with one or more unbalanced channels, and thus need to trigger a rebalancing operation. In this paper, we study how a relay node can maximize its profits from fees by using the rebalancing method of submarine swaps. We introduce a stochastic model to capture the dynamics of a relay node observing random transaction arrivals and performing occasional rebalancing operations, and express the system evolution as a Markov Decision Process. We formulate the problem of the maximization of the node's fortune over time over all rebalancing policies, and approximate the optimal solution by designing a Deep Reinforcement Learning (DRL)-based rebalancing policy. We build a discrete event simulator of the system and use it to demonstrate the DRL policy's superior performance under most conditions by conducting a comparative study of different policies and parameterizations. Our work is the first to introduce DRL for liquidity management in the complex world of PCNs.