使用观察数据的因果深度加强学习

论文标题

使用观察数据的因果深度加强学习

Causal Deep Reinforcement Learning Using Observational Data

论文作者

Zhu, Wenxuan, Yu, Chao, Zhang, Qiang

论文摘要

深度加强学习（DRL）需要收集介入数据，这在现实世界中有时甚至是不道德的，例如在自主驾驶和医疗领域。离线强化学习有望通过利用现实世界中可用的大量观察数据来减轻这个问题。但是，如果生成数据的行为策略取决于未观察到的随机变量（即混杂因素），则观察数据可能会误导学习代理到不良结果。在本文中，我们提出了DRL中的两种变形方法来解决此问题。该方法首先根据因果推理技术来计算不同样本的重要性程度，然后通过重新采样或重新采样离线数据集以确保其无偏见性来调整不同样本对损耗函数的影响。如果这些算法的损失函数可以满足弱条件，则可以将这些变形的方法与现有的无模型DRL算法（例如软critic-Critic和Deep Q-Learning）相结合。我们证明了我们变形方法的有效性，并通过实验验证它们。

Deep reinforcement learning (DRL) requires the collection of interventional data, which is sometimes expensive and even unethical in the real world, such as in the autonomous driving and the medical field. Offline reinforcement learning promises to alleviate this issue by exploiting the vast amount of observational data available in the real world. However, observational data may mislead the learning agent to undesirable outcomes if the behavior policy that generates the data depends on unobserved random variables (i.e., confounders). In this paper, we propose two deconfounding methods in DRL to address this problem. The methods first calculate the importance degree of different samples based on the causal inference technique, and then adjust the impact of different samples on the loss function by reweighting or resampling the offline dataset to ensure its unbiasedness. These deconfounding methods can be flexibly combined with existing model-free DRL algorithms such as soft actor-critic and deep Q-learning, provided that a weak condition can be satisfied by the loss functions of these algorithms. We prove the effectiveness of our deconfounding methods and validate them experimentally.

下载PDF全文

下载文献需遵守相关版权规定

论文标题