论文标题
关于遗憾 - 最佳的合作型非策略多军匪
On Regret-optimal Cooperative Nonstochastic Multi-armed Bandits
论文作者
论文摘要
我们考虑通过具有延迟的通信网络协作的代理商的非策略多代理多军强盗问题。我们为所有代理人的个人遗憾显示出一个下限。我们表明,借助适当的正规化和通信协议,一个协作的多代理\ emph {laste-the-the-the-the-the-the-the-the-the-the-the-the-emphized-Leader}(ftrl)算法具有一个个体的遗憾上限,当相对于通信图中的代理商的程度足够大的武器数量足够大时,将下部界限匹配到恒定因素。我们还表明,具有适当正常化程序的FTRL算法对于用边缘 - 戴式参数的缩放而言,很遗憾。我们提出了验证理论结果的数值实验,并证明了算法优于先前提出的算法时的情况。
We consider the nonstochastic multi-agent multi-armed bandit problem with agents collaborating via a communication network with delays. We show a lower bound for individual regret of all agents. We show that with suitable regularizers and communication protocols, a collaborative multi-agent \emph{follow-the-regularized-leader} (FTRL) algorithm has an individual regret upper bound that matches the lower bound up to a constant factor when the number of arms is large enough relative to degrees of agents in the communication graph. We also show that an FTRL algorithm with a suitable regularizer is regret optimal with respect to the scaling with the edge-delay parameter. We present numerical experiments validating our theoretical results and demonstrate cases when our algorithms outperform previously proposed algorithms.