私人强化学习与PAC和遗憾保证

论文标题

私人强化学习与PAC和遗憾保证

Private Reinforcement Learning with PAC and Regret Guarantees

论文作者

Vietri, Giuseppe, Balle, Borja, Krishnamurthy, Akshay, Wu, Zhiwei Steven

论文摘要

由高风险决策领域（例如个性化医学）固有敏感的高风险决策领域的动机，我们设计了为情节增强学习（RL）的探索策略保存探索政策。我们首先使用联合差异隐私（JDP）的概念提供有意义的隐私配方 - 对于每个用户收到自己的输出集（例如，策略建议）的设置，差异隐私的强大变体。然后，我们开发了一种基于私人乐观的学习算法，该算法同时实现了强大的PAC和遗憾，并享受JDP保证。我们的算法仅支付探索中适度的隐私成本的费用：与非私人界限相比，隐私参数仅以低阶术语出现。最后，我们介绍了样本复杂性的下限和对强化学习的后悔。

Motivated by high-stakes decision-making domains like personalized medicine where user information is inherently sensitive, we design privacy preserving exploration policies for episodic reinforcement learning (RL). We first provide a meaningful privacy formulation using the notion of joint differential privacy (JDP)--a strong variant of differential privacy for settings where each user receives their own sets of output (e.g., policy recommendations). We then develop a private optimism-based learning algorithm that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee. Our algorithm only pays for a moderate privacy cost on exploration: in comparison to the non-private bounds, the privacy parameter only appears in lower-order terms. Finally, we present lower bounds on sample complexity and regret for reinforcement learning subject to JDP.

下载PDF全文

下载文献需遵守相关版权规定

论文标题