论文标题
在随机控制系统中学习稳定政策
Learning Stabilizing Policies in Stochastic Control Systems
论文作者
论文摘要
在这项工作中,我们解决了为随机控制系统学习稳定的神经网络政策的问题。尽管最近的工作证明了使用Martingale理论认证给定策略的可行性,但如何探讨如何学习此类政策的问题。在这里,我们研究了共同学习政策以及Martingale证书的有效性,该证书使用单个学习算法证明了其稳定性。我们观察到,从随机初始化的策略开始时,关节优化问题很容易被卡在本地最小值中。我们的结果表明,联合优化成功修复和成功验证该政策需要某种形式的政策预培训。
In this work, we address the problem of learning provably stable neural network policies for stochastic control systems. While recent work has demonstrated the feasibility of certifying given policies using martingale theory, the problem of how to learn such policies is little explored. Here, we study the effectiveness of jointly learning a policy together with a martingale certificate that proves its stability using a single learning algorithm. We observe that the joint optimization problem becomes easily stuck in local minima when starting from a randomly initialized policy. Our results suggest that some form of pre-training of the policy is required for the joint optimization to repair and verify the policy successfully.