论文标题

政策梯度方法如何受控制范围的影响?

How are policy gradient methods affected by the limits of control?

论文作者

Ziemann, Ingvar, Tsiamis, Anastasios, Sandberg, Henrik, Matni, Nikolai

论文摘要

我们从控制理论限制的角度研究随机策略梯度方法。我们的主要结果是,在多伊尔(Doyle)意义上,条件不足的线性系统不可避免地会导致嘈杂的梯度估计。我们还举例说明了一类稳定系统,其中政策梯度方法遭受了维度的诅咒。我们的结果适用于状态反馈和部分观察到的系统。

We study stochastic policy gradient methods from the perspective of control-theoretic limitations. Our main result is that ill-conditioned linear systems in the sense of Doyle inevitably lead to noisy gradient estimates. We also give an example of a class of stable systems in which policy gradient methods suffer from the curse of dimensionality. Our results apply to both state feedback and partially observed systems.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源