在线预选与Plackett-Luce模型下的上下文信息

论文标题

在线预选与Plackett-Luce模型下的上下文信息

Online Preselection with Context Information under the Plackett-Luce Model

论文作者

Mesaoudi-Paul, Adil El, Bengs, Viktor, Hüllermeier, Eyke

论文摘要

我们考虑了上下文多臂匪徒问题的扩展，其中，学习者不应该选择单个替代方案（ARM），而应以替代方案子集的形式进行预选。更具体地说，在每次迭代中，都会为学习者提供一组武器和上下文，这两者都在特征向量方面进行了描述。学习者的任务是预选这些武器的$ k $，其中在第二步中做出了最终选择。在我们的设置中，我们假设每个臂都有一个潜在的（上下文依赖性）效用，并且对预选的反馈是根据plackett-luce模型产生的。我们提出了CPPL算法，该算法灵感来自众所周知的UCB算法，并评估了该算法的合成和真实数据。特别是，我们考虑了一种在线算法选择方案，这是我们问题设置的主要动机。在这里，可以通过不同的算法（武器）来解决某个问题类（例如SAT）的实例（定义上下文），但是实际上只能运行这些算法的$ K $。

We consider an extension of the contextual multi-armed bandit problem, in which, instead of selecting a single alternative (arm), a learner is supposed to make a preselection in the form of a subset of alternatives. More specifically, in each iteration, the learner is presented a set of arms and a context, both described in terms of feature vectors. The task of the learner is to preselect $k$ of these arms, among which a final choice is made in a second step. In our setup, we assume that each arm has a latent (context-dependent) utility, and that feedback on a preselection is produced according to a Plackett-Luce model. We propose the CPPL algorithm, which is inspired by the well-known UCB algorithm, and evaluate this algorithm on synthetic and real data. In particular, we consider an online algorithm selection scenario, which served as a main motivation of our problem setting. Here, an instance (which defines the context) from a certain problem class (such as SAT) can be solved by different algorithms (the arms), but only $k$ of these algorithms can actually be run.

下载PDF全文

下载文献需遵守相关版权规定

论文标题