有效学习线性二次调节器的对数遗憾

论文标题

有效学习线性二次调节器的对数遗憾

Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently

论文作者

Cassel, Asaf, Cohen, Alon, Koren, Tomer

论文摘要

我们考虑了最初未知的过渡参数的线性二次控制系统中学习的问题。这种环境中的最新结果表明了有效的学习算法，而决策步骤数量的平方根也遗憾。我们提出了新的有效算法，可能令人惊讶地遗憾的是，在两种情况下，只能与（poly）进行对数（poly）的比分数：当只有状态过渡矩阵$ a $ a $是未知的，而当只有国家行动过渡矩阵$ b $是未知的，而最佳政策就满足了一定的非级别条件。另一方面，我们给出了一个下限，表明当后一种情况违反后，平方根遗憾是不可避免的。

We consider the problem of learning in Linear Quadratic Control systems whose transition parameters are initially unknown. Recent results in this setting have demonstrated efficient learning algorithms with regret growing with the square root of the number of decision steps. We present new efficient algorithms that achieve, perhaps surprisingly, regret that scales only (poly)logarithmically with the number of steps in two scenarios: when only the state transition matrix $A$ is unknown, and when only the state-action transition matrix $B$ is unknown and the optimal policy satisfies a certain non-degeneracy condition. On the other hand, we give a lower bound that shows that when the latter condition is violated, square root regret is unavoidable.

下载PDF全文

下载文献需遵守相关版权规定

论文标题