论文标题

私人强化学习与PAC和遗憾保证

Private Reinforcement Learning with PAC and Regret Guarantees

论文作者

Vietri, Giuseppe, Balle, Borja, Krishnamurthy, Akshay, Wu, Zhiwei Steven

论文摘要

由高风险决策领域(例如个性化医学)固有敏感的高风险决策领域的动机,我们设计了为情节增强学习(RL)的探索策略保存探索政策。我们首先使用联合差异隐私(JDP)的概念提供有意义的隐私配方 - 对于每个用户收到自己的输出集(例如,策略建议)的设置,差异隐私的强大变体。然后,我们开发了一种基于私人乐观的学习算法,该算法同时实现了强大的PAC和遗憾,并享受JDP保证。我们的算法仅支付探索中适度的隐私成本的费用:与非私人界限相比,隐私参数仅以低阶术语出现。最后,我们介绍了样本复杂性的下限和对强化学习的后悔。

Motivated by high-stakes decision-making domains like personalized medicine where user information is inherently sensitive, we design privacy preserving exploration policies for episodic reinforcement learning (RL). We first provide a meaningful privacy formulation using the notion of joint differential privacy (JDP)--a strong variant of differential privacy for settings where each user receives their own sets of output (e.g., policy recommendations). We then develop a private optimism-based learning algorithm that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee. Our algorithm only pays for a moderate privacy cost on exploration: in comparison to the non-private bounds, the privacy parameter only appears in lower-order terms. Finally, we present lower bounds on sample complexity and regret for reinforcement learning subject to JDP.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源