VCG机理设计在随机匪徒反馈下具有未知代理值

论文标题

VCG机理设计在随机匪徒反馈下具有未知代理值

VCG Mechanism Design with Unknown Agent Values under Stochastic Bandit Feedback

论文作者

Kandasamy, Kirthevasan, Gonzalez, Joseph E., Jordan, Michael I., Stoica, Ion

论文摘要

在代理不知道其价值的情况下，我们研究了多轮福利最大化的机制设计问题。在每个回合中，一个机制首先将分配分配给一组代理，并向他们收取价格。在回合结束时，代理商（随机）反馈给他们收到的分配机制。这种设置是由云市场和在线广告中的应用程序激励的，在该广告中，代理商只有在体验后才知道她的分配价值。因此，该机制需要探索每个代理的不同分配，以便它可以学习其价值，同时尝试找到社会上最佳的分配集。我们的重点是从长远来看古典VCG机制的真实和个人合理的机制。为此，我们首先为福利，每个代理的各个公用事业和机制的各个公用事业定义了三个遗憾概念。我们表明，这三个术语是通过$ω（t^{\ frac {2} {3}}）$相互依存的。在分配$ t $ rounds之后，这三个术语的最大限制是$ late Bond，并描述了一种实质上可以实现此速率的算法。我们的框架还提供了控制定价计划的灵活性，以使代理商和卖方遗憾之间的权衡。接下来，我们为真实性和个人合理性要求定义渐近变体，并提供渐近率，以量化所提出的算法满足两种属性的程度。

We study a multi-round welfare-maximising mechanism design problem in instances where agents do not know their values. On each round, a mechanism first assigns an allocation each to a set of agents and charges them a price; at the end of the round, the agents provide (stochastic) feedback to the mechanism for the allocation they received. This setting is motivated by applications in cloud markets and online advertising where an agent may know her value for an allocation only after experiencing it. Therefore, the mechanism needs to explore different allocations for each agent so that it can learn their values, while simultaneously attempting to find the socially optimal set of allocations. Our focus is on truthful and individually rational mechanisms which imitate the classical VCG mechanism in the long run. To that end, we first define three notions of regret for the welfare, the individual utilities of each agent and that of the mechanism. We show that these three terms are interdependent via an $Ω(T^{\frac{2}{3}})$ lower bound for the maximum of these three terms after $T$ rounds of allocations, and describe an algorithm which essentially achieves this rate. Our framework also provides flexibility to control the pricing scheme so as to trade-off between the agent and seller regrets. Next, we define asymptotic variants for the truthfulness and individual rationality requirements and provide asymptotic rates to quantify the degree to which both properties are satisfied by the proposed algorithm.

下载PDF全文

下载文献需遵守相关版权规定

论文标题