预算组合多军匪

论文标题

预算组合多军匪

Budgeted Combinatorial Multi-Armed Bandits

论文作者

Das, Debojit, Jain, Shweta, Gujar, Sujit

论文摘要

我们考虑了预算的组合多臂强盗设置，在每个回合中，该算法选择一个由一个或多个手臂组成的超级臂。目标是在预算有限的所有回合之后，最大程度地减少预期的遗憾。本文中的现有技术要么修理预算，要么修复每轮中拉动的武器数量。我们的设置更为笼统，基于剩余的预算和剩余的回合数量，算法可以决定在每个回合中要拉多少臂。首先，我们建议使用贪婪技术CBWK-Greedy的CBWK-Greedy-UCB算法将手臂分配给圆圈。接下来，我们提出将这个问题减少到带有背包（BWK）的匪徒。通过这种减少，我们提出了巧妙地使用PrimallaldualBWK的CBWK-LPUCB。我们严格地证明了CBWK-LP-UCB的遗憾。我们通过实验比较了两种算法，并观察到CBWK-Greedy-UCB的性能比CBWK-LP-UCB逐渐更好。我们还表明，对于非常高的预算，遗憾的是零。

We consider a budgeted combinatorial multi-armed bandit setting where, in every round, the algorithm selects a super-arm consisting of one or more arms. The goal is to minimize the total expected regret after all rounds within a limited budget. Existing techniques in this literature either fix the budget per round or fix the number of arms pulled in each round. Our setting is more general where based on the remaining budget and remaining number of rounds, the algorithm can decide how many arms to be pulled in each round. First, we propose CBwK-Greedy-UCB algorithm, which uses a greedy technique, CBwK-Greedy, to allocate the arms to the rounds. Next, we propose a reduction of this problem to Bandits with Knapsacks (BwK) with a single pull. With this reduction, we propose CBwK-LPUCB that uses PrimalDualBwK ingeniously. We rigorously prove regret bounds for CBwK-LP-UCB. We experimentally compare the two algorithms and observe that CBwK-Greedy-UCB performs incrementally better than CBwK-LP-UCB. We also show that for very high budgets, the regret goes to zero.

下载PDF全文

下载文献需遵守相关版权规定

论文标题