FedFormer：在加强学习中关注的情境联合会

论文标题

FedFormer：在加强学习中关注的情境联合会

FedFormer: Contextual Federation with Attention in Reinforcement Learning

论文作者

Hebert, Liam, Golab, Lukasz, Poupart, Pascal, Cohen, Robin

论文摘要

多代理联合加强学习中的核心问题是定义如何汇总多个代理的见解。这通常是通过将每个参与代理的模型权重的平均值作为一个通用模型（FedAvg）来完成。相反，我们提出了FedFormer，这是一种新颖的联合策略，它利用变形金刚的注意力从不同学习者代理的模型上进行上下文聚集的嵌入。这样一来，我们便认真地权衡了其他代理商在当前代理的环境和学习的关系方面的贡献，从而提供了更有效，更有效的联合会。我们评估了我们在元世界环境上的方法，发现我们的方法比FedAvg和非赋予软性角色批评的单一代理方法对我们的方法产生了重大改进。我们的结果与软批评者相比表明，FedFormer可以在较高的情节回归中获得更高的回报，同时仍遵守联合学习的隐私限制。最后，我们还证明了在某些任务中所有方法中的代理池增加的有效性提高。这与FedAvg形成鲜明对比，FedAvg在缩放时无法进行明显的改进。

A core issue in multi-agent federated reinforcement learning is defining how to aggregate insights from multiple agents. This is commonly done by taking the average of each participating agent's model weights into one common model (FedAvg). We instead propose FedFormer, a novel federation strategy that utilizes Transformer Attention to contextually aggregate embeddings from models originating from different learner agents. In so doing, we attentively weigh the contributions of other agents with respect to the current agent's environment and learned relationships, thus providing a more effective and efficient federation. We evaluate our methods on the Meta-World environment and find that our approach yields significant improvements over FedAvg and non-federated Soft Actor-Critic single-agent methods. Our results compared to Soft Actor-Critic show that FedFormer achieves higher episodic return while still abiding by the privacy constraints of federated learning. Finally, we also demonstrate improvements in effectiveness with increased agent pools across all methods in certain tasks. This is contrasted by FedAvg, which fails to make noticeable improvements when scaled.

下载PDF全文

下载文献需遵守相关版权规定

论文标题