论文标题

Wang-Foster-Kakade的变体用于折扣设置

A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting

论文作者

Amortila, Philip, Jiang, Nan, Xie, Tengyang

论文摘要

最近,Wang等人。 (2020年)在有限的 - 摩尼子案例中具有线性可实现的值函数和良好的特征覆盖率,显示出高度有趣的硬度结果。在本说明中,我们表明,一旦适应了折扣设置,就可以将构造简化为具有一维功能的2态MDP,因此即使使用无限的数据也无法学习。

Recently, Wang et al. (2020) showed a highly intriguing hardness result for batch reinforcement learning (RL) with linearly realizable value function and good feature coverage in the finite-horizon case. In this note we show that once adapted to the discounted setting, the construction can be simplified to a 2-state MDP with 1-dimensional features, such that learning is impossible even with an infinite amount of data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源