论文标题
平均LQR问题的收敛结果以及用于加强学习的应用
Convergence results for an averaged LQR problem with applications to reinforcement learning
论文作者
论文摘要
在本文中,我们将与未知动力学一起处理线性二次最佳控制问题。作为建模假设,我们将假设代理在当前系统上具有的知识由矩阵空间上的概率分布$π$表示。此外,我们将假设这种概率度量是及时更新的,以考虑到代理在探索环境时获得的增加的经验,并随着准确性提高了基础动力学。在这些假设下,我们将证明,通过求解与某个$π$收敛到与与实际动力学基础的线性二次最佳控制问题相关的最佳控制驱动的“平均”线性二次最佳控制问题获得的最佳控制。这种方法与基于模型的增强学习算法密切相关,在该学习算法中,递归更新了描述不确定系统知识的知识的先验和后验概率分布。在最后一部分中,我们将显示一个数值测试,以确认理论结果。
In this paper, we will deal with a Linear Quadratic Optimal Control problem with unknown dynamics. As a modeling assumption, we will suppose that the knowledge that an agent has on the current system is represented by a probability distribution $π$ on the space of matrices. Furthermore, we will assume that such a probability measure is opportunely updated to take into account the increased experience that the agent obtains while exploring the environment, approximating with increasing accuracy the underlying dynamics. Under these assumptions, we will show that the optimal control obtained by solving the "average" Linear Quadratic Optimal Control problem with respect to a certain $π$ converges to the optimal control driven related to the Linear Quadratic Optimal Control problem governed by the actual, underlying dynamics. This approach is closely related to model-based Reinforcement Learning algorithms where prior and posterior probability distributions describing the knowledge on the uncertain system are recursively updated. In the last section, we will show a numerical test that confirms the theoretical results.