论文标题

冷启动对话推荐的元政策学习

Meta Policy Learning for Cold-Start Conversational Recommendation

论文作者

Chu, Zhendong, Wang, Hongning, Xiao, Yun, Long, Bo, Wu, Lingfei

论文摘要

会话推荐系统(CRS)明确征求用户的偏好,以进行改进的建议。大多数现有的CRS解决方案都依靠一项单一政策,该政策是通过对用户群体的强化学习培训的。但是,对于刚接触该系统的用户,这样的全球政策使他们无法满足他们的需求,即挑战是寒冷的挑战。在本文中,我们通过元强化学习研究CRS政策学习,以供冷启动用户。我们建议学习元政策,并将其调整给新用户,只有几次对话建议的试验。为了促进快速的政策适应,我们设计了三个协同组件。首先,我们设计了一种元探索策略,该政策专门通过一些探索性对话来识别用户偏好,从而加快了从元策略中的个性化策略调整。其次,我们适应每个用户的项目推荐模块,以最大程度地基于对话期间收集的对话状态最大化建议质量。第三,我们建议基于变压器的状态编码器作为连接前两个组件的骨干。它通过建模对话期间正面和负面反馈之间的复杂关系来提供全面的状态表示。与一系列最先进的CRS解决方案相比,三个数据集上的大量实验证明了我们的解决方案的优势。

Conversational recommender systems (CRS) explicitly solicit users' preferences for improved recommendations on the fly. Most existing CRS solutions count on a single policy trained by reinforcement learning for a population of users. However, for users new to the system, such a global policy becomes ineffective to satisfy them, i.e., the cold-start challenge. In this paper, we study CRS policy learning for cold-start users via meta-reinforcement learning. We propose to learn a meta policy and adapt it to new users with only a few trials of conversational recommendations. To facilitate fast policy adaptation, we design three synergetic components. Firstly, we design a meta-exploration policy dedicated to identifying user preferences via a few exploratory conversations, which accelerates personalized policy adaptation from the meta policy. Secondly, we adapt the item recommendation module for each user to maximize the recommendation quality based on the collected conversation states during conversations. Thirdly, we propose a Transformer-based state encoder as the backbone to connect the previous two components. It provides comprehensive state representations by modeling complicated relations between positive and negative feedback during the conversation. Extensive experiments on three datasets demonstrate the advantage of our solution in serving new users, compared with a rich set of state-of-the-art CRS solutions.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源