论文标题

使用上下文匪徒的曝光意识建议

Exposure-Aware Recommendation using Contextual Bandits

论文作者

Mansoury, Masoud, Mobasher, Bamshad, van Hoof, Herke

论文摘要

在推荐系统中,曝光偏见是一个众所周知的问题,在推荐结果中没有同样表示项目和供应商。随着时间的流逝而扩大偏见,这是尤其有问题的,因为一些项目(例如流行的项目)反复在推荐列表中反复代表,并且用户与这些项目的交互将扩大对这些项目的偏见,从而导致反馈循环。有关基于模型或基于邻里的建议算法的文献中,对此问题进行了广泛的研究,但是在线推荐模型(例如基于Top-K上下文强盗的模型)上所做的工作较少,其中推荐模型通过持续的用户反馈动态更新。在本文中,我们研究了一类称为线性级联匪徒的众所周知的上下文强盗算法中的暴露偏见。我们分析了这些算法的处理能力,可以在建议结果中为项目提供公平代表。我们的分析表明,随着时间的推移,这些算法倾向于扩大项目之间的暴露差异。特别是,我们观察到,这些算法不能正确适合用户提供的反馈,并且即使用户未选择这些项目,也经常推荐某些项目。为了减轻这种偏见,我们提出了一个曝光感(EA)奖励模型,该模型基于两个因素更新模型参数:1)用户反馈(即是否单击)和2)项目在建议列表中的位置。这样,提出的模型会根据建议列表中的曝光来控制分配给项目的实用程序。使用三种上下文Bandit算法在两个现实世界数据集上进行的广泛实验表明,所提出的奖励模型可以长期降低曝光偏差放大,同时保持建议准确性。

Exposure bias is a well-known issue in recommender systems where items and suppliers are not equally represented in the recommendation results. This is especially problematic when bias is amplified over time as a few items (e.g., popular ones) are repeatedly over-represented in recommendation lists and users' interactions with those items will amplify bias towards those items over time resulting in a feedback loop. This issue has been extensively studied in the literature on model-based or neighborhood-based recommendation algorithms, but less work has been done on online recommendation models, such as those based on top-K contextual bandits, where recommendation models are dynamically updated with ongoing user feedback. In this paper, we study exposure bias in a class of well-known contextual bandit algorithms known as Linear Cascading Bandits. We analyze these algorithms on their ability to handle exposure bias and provide a fair representation for items in the recommendation results. Our analysis reveals that these algorithms tend to amplify exposure disparity among items over time. In particular, we observe that these algorithms do not properly adapt to the feedback provided by the users and frequently recommend certain items even when those items are not selected by users. To mitigate this bias, we propose an Exposure-Aware (EA) reward model that updates the model parameters based on two factors: 1) user feedback (i.e., clicked or not), and 2) position of the item in the recommendation list. This way, the proposed model controls the utility assigned to items based on their exposure in the recommendation list. Extensive experiments on two real-world datasets using three contextual bandit algorithms show that the proposed reward model reduces exposure bias amplification in long run while maintaining the recommendation accuracy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源