论文标题
统治或删除:串行独裁统治的分散竞争匪徒
Dominate or Delete: Decentralized Competing Bandits in Serial Dictatorship
论文作者
论文摘要
在双面匹配市场中的在线学习,需求侧代理不断竞争与供应方(武器)匹配,将匹配平台上部分信息下的复杂交互(例如UPWork,TaskRabbit)提取。我们研究了分散的串行独裁统治环境,这是一个双面匹配的市场,需求侧代理在供应方面(武器)的需求方面的估值未知和异质估值,而武器对需求方(代理)的均匀偏好。我们设计了第一种分散的算法-UCB,具有分散的主导臂删除(UCB-D3),对于代理商而言,不需要任何奖励差距或时间范围的知识。 UCB-D3分阶段工作,在每个阶段,代理删除\ emph {主臂} - 较高排名的代理首选的手臂,并且仅根据UCB播放。在阶段结束时,代理商以分散的方式广播,他们估计的首选武器是通过{\ em纯剥削}进行的。我们证明了这两个新的遗憾下放串行独裁统治模型的新遗憾,并且UCB-D3是最佳的订单。
Online learning in a two-sided matching market, with demand side agents continuously competing to be matched with supply side (arms), abstracts the complex interactions under partial information on matching platforms (e.g. UpWork, TaskRabbit). We study the decentralized serial dictatorship setting, a two-sided matching market where the demand side agents have unknown and heterogeneous valuation over the supply side (arms), while the arms have known uniform preference over the demand side (agents). We design the first decentralized algorithm -- UCB with Decentralized Dominant-arm Deletion (UCB-D3), for the agents, that does not require any knowledge of reward gaps or time horizon. UCB-D3 works in phases, where in each phase, agents delete \emph{dominated arms} -- the arms preferred by higher ranked agents, and play only from the non-dominated arms according to the UCB. At the end of the phase, agents broadcast in a decentralized fashion, their estimated preferred arms through {\em pure exploitation}. We prove both, a new regret lower bound for the decentralized serial dictatorship model, and that UCB-D3 is order optimal.