论文标题
具有线性函数近似的拜占庭有弹性分散的TD学习
Byzantine-Resilient Decentralized TD Learning with Linear Function Approximation
论文作者
论文摘要
本文考虑了通过分散和定向网络的多代理增强学习(MARL)环境中的政策评估问题。重点是在存在不可靠甚至恶意药物的存在下,以线性函数近似为中心化的时间差(TD)学习,称为拜占庭式药物。为了评估共同环境中固定政策的质量,代理通常会协作进行分散的TD($λ$)。但是,当某些拜占庭式代理在对抗方面的行为时,分散的TD($λ$)无法学习真实值函数的准确线性近似。我们提出了一种基于修剪的均值的拜占庭式分散性TD($λ$)算法,以在此设置中执行策略评估。我们建立了有限的时间收敛速率,以及在拜占庭式药物存在下的渐近学习误差。数值实验证实了所提出算法的鲁棒性。
This paper considers the policy evaluation problem in a multi-agent reinforcement learning (MARL) environment over decentralized and directed networks. The focus is on decentralized temporal difference (TD) learning with linear function approximation in the presence of unreliable or even malicious agents, termed as Byzantine agents. In order to evaluate the quality of a fixed policy in a common environment, agents usually run decentralized TD($λ$) collaboratively. However, when some Byzantine agents behave adversarially, decentralized TD($λ$) is unable to learn an accurate linear approximation for the true value function. We propose a trimmed-mean based Byzantine-resilient decentralized TD($λ$) algorithm to perform policy evaluation in this setting. We establish the finite-time convergence rate, as well as the asymptotic learning error in the presence of Byzantine agents. Numerical experiments corroborate the robustness of the proposed algorithm.