分布式土匪：$ d $ regular图上的概率通信

论文标题

分布式土匪：$ d $ regular图上的概率通信

Distributed Bandits: Probabilistic Communication on $d$-regular Graphs

论文作者

Madhushani, Udari, Leonard, Naomi Ehrich

论文摘要

我们研究了针对由$ d $ regarbular图所定义的网络通信的代理商的分散多代理多武器匪徒问题。图中的每个边缘都有概率的权重$ p $，以说明通信链接失败的（$ 1 \！ - ！ - ！P $）的可能性。在每个时间步骤中，每个代理都会选择一个手臂并获得与所选手臂相关的数值奖励。每次选择后，每个代理都以概率$ p $的方式观察每个邻居的最后获得的奖励。我们提出了一种新的基于上限的算法（UCB）算法，并分析基于代理的策略如何在这种概率沟通环境中最大程度地减少群体后悔。我们提供的理论保证我们的算法优于最先进的算法。我们说明了我们的结果并使用数值模拟验证理论主张。

We study the decentralized multi-agent multi-armed bandit problem for agents that communicate with probability over a network defined by a $d$-regular graph. Every edge in the graph has probabilistic weight $p$ to account for the ($1\!-\!p$) probability of a communication link failure. At each time step, each agent chooses an arm and receives a numerical reward associated with the chosen arm. After each choice, each agent observes the last obtained reward of each of its neighbors with probability $p$. We propose a new Upper Confidence Bound (UCB) based algorithm and analyze how agent-based strategies contribute to minimizing group regret in this probabilistic communication setting. We provide theoretical guarantees that our algorithm outperforms state-of-the-art algorithms. We illustrate our results and validate the theoretical claims using numerical simulations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题