拜占庭式在线和离线分布的强化学习

论文标题

拜占庭式在线和离线分布的强化学习

Byzantine-Robust Online and Offline Distributed Reinforcement Learning

论文作者

Chen, Yiding, Zhang, Xuezhou, Zhang, Kaiqing, Wang, Mengdi, Zhu, Xiaojin

论文摘要

我们考虑分布式增强学习设置，其中多个代理分别探索环境并通过中央服务器传达其经验。但是，$α$ - 代理商是对抗性的，可以报告任意的伪造信息。至关重要的是，这些对抗者可以串通，其假数据可能具有任何大小。我们希望在存在这些对抗者的情况下对基础马尔可夫决策过程的近乎最佳政策。我们的主要技术贡献是加权型，这是一种可以处理任意批量大小的批次估算的新颖算法。在这个新的估计器的基础上，在离线设置中，我们设计了一种拜占庭式的分布式悲观价值迭代算法；在在线设置中，我们设计了拜占庭式射击分布式乐观的值迭代算法。两种算法都具有与先前的工作相比，具有近乎最佳的样品复杂性并获得了出色的鲁棒性保证。

We consider a distributed reinforcement learning setting where multiple agents separately explore the environment and communicate their experiences through a central server. However, $α$-fraction of agents are adversarial and can report arbitrary fake information. Critically, these adversarial agents can collude and their fake data can be of any sizes. We desire to robustly identify a near-optimal policy for the underlying Markov decision process in the presence of these adversarial agents. Our main technical contribution is Weighted-Clique, a novel algorithm for the robust mean estimation from batches problem, that can handle arbitrary batch sizes. Building upon this new estimator, in the offline setting, we design a Byzantine-robust distributed pessimistic value iteration algorithm; in the online setting, we design a Byzantine-robust distributed optimistic value iteration algorithm. Both algorithms obtain near-optimal sample complexities and achieve superior robustness guarantee than prior works.

下载PDF全文

下载文献需遵守相关版权规定

论文标题