论文标题
半参数上下文匪徒,带有拉普拉斯正则化
Semi-Parametric Contextual Bandits with Graph-Laplacian Regularization
论文作者
论文摘要
非平稳性在人类行为中无处不在,在上下文的土匪中解决它是具有挑战性的。几项作品通过调查半参数上下文匪徒解决了这个问题,并警告说忽略非平稳性可能会损害表现。另一个普遍的人类行为是社交互动,它以社交网络或图形结构的形式获得。结果,基于图的上下文匪徒受到了很多关注。在本文中,我们提出了一种基于图形的半参数奖励模型的新型上下文汤普森 - 抽采样算法。我们的算法是在这种情况下首次提出的算法。我们得出了累积遗憾的上限,该阶段可以根据图形结构和没有图形的半参数模型的顺序表示为因素的倍数。我们通过仿真和实际数据示例评估了提出的算法和现有算法。
Non-stationarity is ubiquitous in human behavior and addressing it in the contextual bandits is challenging. Several works have addressed the problem by investigating semi-parametric contextual bandits and warned that ignoring non-stationarity could harm performances. Another prevalent human behavior is social interaction which has become available in a form of a social network or graph structure. As a result, graph-based contextual bandits have received much attention. In this paper, we propose "SemiGraphTS," a novel contextual Thompson-sampling algorithm for a graph-based semi-parametric reward model. Our algorithm is the first to be proposed in this setting. We derive an upper bound of the cumulative regret that can be expressed as a multiple of a factor depending on the graph structure and the order for the semi-parametric model without a graph. We evaluate the proposed and existing algorithms via simulation and real data example.