在战略代理人面前的在线分配和学习

论文标题

在战略代理人面前的在线分配和学习

Online Allocation and Learning in the Presence of Strategic Agents

论文作者

Yin, Steven, Agrawal, Shipra, Zeevi, Assaf

论文摘要

我们研究了在$ n $均质代理商中分配$ t $依次到达项目的问题，即每个代理必须收到所有项目的预先指定的部分，目的是最大化代理商分配给他们的项目的总估值。假定代理在每轮中对该项目的估值为I.I.D。但是它们的分布是中央计划者未知的先验。因此，中央规划师需要从观察到的价值中隐含地学习这些分布，以便选择良好的分配策略。但是，这里的另一个挑战是，代理商是战略性的，并激励他们误导其估值以获得更好的分配。这使我们的工作与在线拍卖设计设置不同，该设计通常假设已知的估值分布和/或涉及付款，以及从不考虑战略代理的在线学习设置。为此，我们的主要贡献是一种基于在线学习的分配机制，大约是贝叶斯激励措施，当所有代理人都是真实的时，与最佳离线分配政策相比，与单个代理商的实用性相比，可以保证对单个代理商的效用产生统一的遗憾。

We study the problem of allocating $T$ sequentially arriving items among $n$ homogeneous agents under the constraint that each agent must receive a pre-specified fraction of all items, with the objective of maximizing the agents' total valuation of items allocated to them. The agents' valuations for the item in each round are assumed to be i.i.d. but their distribution is a priori unknown to the central planner. Therefore, the central planner needs to implicitly learn these distributions from the observed values in order to pick a good allocation policy. However, an added challenge here is that the agents are strategic with incentives to misreport their valuations in order to receive better allocations. This sets our work apart both from the online auction design settings which typically assume known valuation distributions and/or involve payments, and from the online learning settings that do not consider strategic agents. To that end, our main contribution is an online learning based allocation mechanism that is approximately Bayesian incentive compatible, and when all agents are truthful, guarantees a sublinear regret for individual agents' utility compared to that under the optimal offline allocation policy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题