论文标题

合作多代理上下文强盗的内核方法

Kernel Methods for Cooperative Multi-Agent Contextual Bandits

论文作者

Dubey, Abhimanyu, Pentland, Alex

论文摘要

合作的多代理决策涉及一组代理商合作解决学习问题,同时通过延迟的网络进行沟通。在本文中,我们考虑了内核的上下文匪徒问题,其中代理商获得的奖励是相关复制的内核Hilbert Space(RKHS)中上下文图像的任意线性函数,并且一组代理必须合作以集体地解决他们独特的决策问题。对于这个问题,我们提出\ textsc {coop-kerneLucb},这是一种算法,在每个机构的遗憾中提供了近乎最佳的界限,并且在计算和沟通上都是有效的。对于合作问题的特殊情况,我们还提供了\ textsc {coop-kerneLucb}的变体,可提供最佳的每人遗憾。此外,我们的算法在多代理强盗设置中概括了几个现有结果。最后,在一系列合成和现实世界的多代理网络基准中,我们证明我们的算法显着优于现有基准。

Cooperative multi-agent decision making involves a group of agents cooperatively solving learning problems while communicating over a network with delays. In this paper, we consider the kernelised contextual bandit problem, where the reward obtained by an agent is an arbitrary linear function of the contexts' images in the related reproducing kernel Hilbert space (RKHS), and a group of agents must cooperate to collectively solve their unique decision problems. For this problem, we propose \textsc{Coop-KernelUCB}, an algorithm that provides near-optimal bounds on the per-agent regret, and is both computationally and communicatively efficient. For special cases of the cooperative problem, we also provide variants of \textsc{Coop-KernelUCB} that provides optimal per-agent regret. In addition, our algorithm generalizes several existing results in the multi-agent bandit setting. Finally, on a series of both synthetic and real-world multi-agent network benchmarks, we demonstrate that our algorithm significantly outperforms existing benchmarks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源