在广义线性匪徒中排名

论文标题

在广义线性匪徒中排名

Ranking In Generalized Linear Bandits

论文作者

Shidani, Amitis, Deligiannidis, George, Doucet, Arnaud

论文摘要

我们研究广义线性匪徒中的排名问题。每次学习代理都会选择一个有序的项目列表，并观察随机结果。在推荐系统中，显示最有吸引力的项目的有序列表并不总是最佳的，因为位置和项目依赖性都会带来复杂的奖励功能。一个非常天真的例子是，当所有最有吸引力的物品都来自同一类别时，缺乏多样性。我们为此问题建模了有序列表和设计UCB和Thompson采样类型算法中的位置和项目依赖项。我们的工作将现有的研究推广到多个方向，包括位置折扣是特定情况的位置依赖性，并将排名问题与图理论联系起来。

We study the ranking problem in generalized linear bandits. At each time, the learning agent selects an ordered list of items and observes stochastic outcomes. In recommendation systems, displaying an ordered list of the most attractive items is not always optimal as both position and item dependencies result in a complex reward function. A very naive example is the lack of diversity when all the most attractive items are from the same category. We model the position and item dependencies in the ordered list and design UCB and Thompson Sampling type algorithms for this problem. Our work generalizes existing studies in several directions, including position dependencies where position discount is a particular case, and connecting the ranking problem to graph theory.

下载PDF全文

下载文献需遵守相关版权规定

论文标题