论文标题
通过遗憾界限进行学习代理人的遗憾范围转移
Transfer in Reinforcement Learning via Regret Bounds for Learning Agents
论文作者
论文摘要
我们提出了一种通过遗憾界限进行多代理设置的遗憾界限来量化转移有用性的方法。考虑到在同一马尔可夫决策过程中运作的$ \ aleph $代理商,但是可能具有不同的奖励功能,我们认为每个代理人对最大程度地最大化其平均奖励的最佳政策遭受的遗憾。我们表明,当代理商共享他们的观察结果时,与每个代理商必须依靠自己收集的信息相比,所有代理商的总遗憾的较小较小。该结果表明,考虑多代理设置中的遗憾如何为分享转移学习中的观察结果的好处提供理论界限。
We present an approach for the quantification of the usefulness of transfer in reinforcement learning via regret bounds for a multi-agent setting. Considering a number of $\aleph$ agents operating in the same Markov decision process, however possibly with different reward functions, we consider the regret each agent suffers with respect to an optimal policy maximizing her average reward. We show that when the agents share their observations the total regret of all agents is smaller by a factor of $\sqrt{\aleph}$ compared to the case when each agent has to rely on the information collected by herself. This result demonstrates how considering the regret in multi-agent settings can provide theoretical bounds on the benefit of sharing observations in transfer learning.