多用户增强学习和低等级奖励

论文标题

多用户增强学习和低等级奖励

Multi-User Reinforcement Learning with Low Rank Rewards

论文作者

Agarwal, Naman, Jain, Prateek, Kowshik, Suhas, Nagaraj, Dheeraj, Netrapalli, Praneeth

论文摘要

在这项工作中，我们考虑了协作多用户增强学习的问题。在这种情况下，有多个用户具有相同的州行动空间和过渡概率，但具有不同的奖励。假设$ n $用户的奖励矩阵具有低级的结构 - 在离线协作过滤设置中的标准且实际上成功的假设 - 与每个用户分别学习MDP的算法相比，我们可以设计具有显着降低样品复杂性的算法。我们的主要贡献是一种算法，该算法与$ n $用户特定的MDP一起探索奖励，并且可以在两个关键设置中有效地学习奖励：表格MDP和线性MDP。当$ n $较大并且等级是恒定的时，每MDP的样本复杂度在状态空间的大小上取决于对数，与标准`与标准的`非验证''算法相比，它代表了指数降低（在状态空间的大小）。

In this work, we consider the problem of collaborative multi-user reinforcement learning. In this setting there are multiple users with the same state-action space and transition probabilities but with different rewards. Under the assumption that the reward matrix of the $N$ users has a low-rank structure -- a standard and practically successful assumption in the offline collaborative filtering setting -- the question is can we design algorithms with significantly lower sample complexity compared to the ones that learn the MDP individually for each user. Our main contribution is an algorithm which explores rewards collaboratively with $N$ user-specific MDPs and can learn rewards efficiently in two key settings: tabular MDPs and linear MDPs. When $N$ is large and the rank is constant, the sample complexity per MDP depends logarithmically over the size of the state-space, which represents an exponential reduction (in the state-space size) when compared to the standard ``non-collaborative'' algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题