合作多代理上下文强盗的内核方法

论文标题

合作多代理上下文强盗的内核方法

Kernel Methods for Cooperative Multi-Agent Contextual Bandits

论文作者

Dubey, Abhimanyu, Pentland, Alex

论文摘要

合作的多代理决策涉及一组代理商合作解决学习问题，同时通过延迟的网络进行沟通。在本文中，我们考虑了内核的上下文匪徒问题，其中代理商获得的奖励是相关复制的内核Hilbert Space（RKHS）中上下文图像的任意线性函数，并且一组代理必须合作以集体地解决他们独特的决策问题。对于这个问题，我们提出\ textsc {coop-kerneLucb}，这是一种算法，在每个机构的遗憾中提供了近乎最佳的界限，并且在计算和沟通上都是有效的。对于合作问题的特殊情况，我们还提供了\ textsc {coop-kerneLucb}的变体，可提供最佳的每人遗憾。此外，我们的算法在多代理强盗设置中概括了几个现有结果。最后，在一系列合成和现实世界的多代理网络基准中，我们证明我们的算法显着优于现有基准。

Cooperative multi-agent decision making involves a group of agents cooperatively solving learning problems while communicating over a network with delays. In this paper, we consider the kernelised contextual bandit problem, where the reward obtained by an agent is an arbitrary linear function of the contexts' images in the related reproducing kernel Hilbert space (RKHS), and a group of agents must cooperate to collectively solve their unique decision problems. For this problem, we propose \textsc{Coop-KernelUCB}, an algorithm that provides near-optimal bounds on the per-agent regret, and is both computationally and communicatively efficient. For special cases of the cooperative problem, we also provide variants of \textsc{Coop-KernelUCB} that provides optimal per-agent regret. In addition, our algorithm generalizes several existing results in the multi-agent bandit setting. Finally, on a series of both synthetic and real-world multi-agent network benchmarks, we demonstrate that our algorithm significantly outperforms existing benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题