学习通过无需交流的多机构增强学习在多模块的推荐中进行协作

论文标题

学习通过无需交流的多机构增强学习在多模块的推荐中进行协作

Learning to Collaborate in Multi-Module Recommendation via Multi-Agent Reinforcement Learning without Communication

论文作者

He, Xu, An, Bo, Li, Yanghua, Chen, Haikai, Wang, Rundong, Wang, Xinrun, Yu, Runsheng, Li, Xin, Wang, Zhirong

论文摘要

随着在线电子商务平台的兴起，越来越多的客户更喜欢在线购物。要销售更多产品，在线平台会引入各种模块，以推荐具有不同属性（例如巨大折扣）的物品。网页通常由不同的独立模块组成。这些模块的排名政策由不同的团队决定，并在没有合作的情况下进行了优化，这可能会导致模块之间的竞争。因此，整个页面的全球政策可能是最佳的。在本文中，我们提出了一种新型的多机构合作增强学习方法，并具有不同模块无法通信的限制。我们的贡献是三倍。首先，受到游戏理论中的解决方案概念相关平衡的启发，我们设计了一个信号网络，以通过为不同模块生成信号（向量）来促进所有模块的合作。其次，提出了信号网络的熵调查版本，以协调代理对最佳全球策略的探索。此外，基于现实世界电子商务数据的实验表明，我们的算法比基线获得了卓越的性能。

With the rise of online e-commerce platforms, more and more customers prefer to shop online. To sell more products, online platforms introduce various modules to recommend items with different properties such as huge discounts. A web page often consists of different independent modules. The ranking policies of these modules are decided by different teams and optimized individually without cooperation, which might result in competition between modules. Thus, the global policy of the whole page could be sub-optimal. In this paper, we propose a novel multi-agent cooperative reinforcement learning approach with the restriction that different modules cannot communicate. Our contributions are three-fold. Firstly, inspired by a solution concept in game theory named correlated equilibrium, we design a signal network to promote cooperation of all modules by generating signals (vectors) for different modules. Secondly, an entropy-regularized version of the signal network is proposed to coordinate agents' exploration of the optimal global policy. Furthermore, experiments based on real-world e-commerce data demonstrate that our algorithm obtains superior performance over baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题