论文标题
在线预选与Plackett-Luce模型下的上下文信息
Online Preselection with Context Information under the Plackett-Luce Model
论文作者
论文摘要
我们考虑了上下文多臂匪徒问题的扩展,其中,学习者不应该选择单个替代方案(ARM),而应以替代方案子集的形式进行预选。更具体地说,在每次迭代中,都会为学习者提供一组武器和上下文,这两者都在特征向量方面进行了描述。学习者的任务是预选这些武器的$ k $,其中在第二步中做出了最终选择。在我们的设置中,我们假设每个臂都有一个潜在的(上下文依赖性)效用,并且对预选的反馈是根据plackett-luce模型产生的。我们提出了CPPL算法,该算法灵感来自众所周知的UCB算法,并评估了该算法的合成和真实数据。特别是,我们考虑了一种在线算法选择方案,这是我们问题设置的主要动机。在这里,可以通过不同的算法(武器)来解决某个问题类(例如SAT)的实例(定义上下文),但是实际上只能运行这些算法的$ K $。
We consider an extension of the contextual multi-armed bandit problem, in which, instead of selecting a single alternative (arm), a learner is supposed to make a preselection in the form of a subset of alternatives. More specifically, in each iteration, the learner is presented a set of arms and a context, both described in terms of feature vectors. The task of the learner is to preselect $k$ of these arms, among which a final choice is made in a second step. In our setup, we assume that each arm has a latent (context-dependent) utility, and that feedback on a preselection is produced according to a Plackett-Luce model. We propose the CPPL algorithm, which is inspired by the well-known UCB algorithm, and evaluate this algorithm on synthetic and real data. In particular, we consider an online algorithm selection scenario, which served as a main motivation of our problem setting. Here, an instance (which defines the context) from a certain problem class (such as SAT) can be solved by different algorithms (the arms), but only $k$ of these algorithms can actually be run.