同时使用仅批评的积分加固学习的执行器约束对未知连续时间非线性系统的最佳跟踪控制

论文标题

同时使用仅批评的积分加固学习的执行器约束对未知连续时间非线性系统的最佳跟踪控制

Simultaneous Identification and Optimal Tracking Control of Unknown Continuous Time Nonlinear System With Actuator Constraints Using Critic-Only Integral Reinforcement Learning

论文作者

Mishra, Amardeep, Ghosh, Satadal

论文摘要

为了消除自适应动态编程（ADP）中漂移动力学的要求，已经提出了积分加固学习（IRL）作为Bellman方程的替代表述。仍然需要进行控制耦合动力学来获得最佳控制努力的封闭形式的表达。除此之外，需要初始稳定控制器和两组神经网络（NN）（称为Actor-Critic）才能实施IRL方案。在本文中，利用了评论家更新法的稳定术语，以避免在IRL框架中使用初始稳定控制器来解决最佳跟踪问题和执行器约束。使用这样的术语，只需要一个NN来在IRL框架中生成最佳控制策略。该评论家网络与体验重播（ER）增强的标识符相结合，以消除IRL算法中控制耦合动力学的必要性。标识符和评论家NN的权重同时更新，并且表明ER-ER-增强标识符能够比没有ER增强功能更好地处理参数变化。新型更新定律的最显着特征是其可变学习率，它根据瞬时汉密尔顿 - 雅各比 - 贝尔曼（HJB）错误来扩展学习的步伐。评论家NN的可变学习率以及标识符NN中的ER技术有助于实现更严格的剩余设置，以解决NN重量的状态错误和误差，如统一的最终界限（UUB）稳定性证明所示。模拟结果验证了非线性系统上提出的“标识符 - 批判性” NN。

In order to obviate the requirement of drift dynamics in adaptive dynamic programming (ADP), integral reinforcement learning (IRL) has been proposed as an alternate formulation of Bellman equation.However control coupling dynamics is still needed to obtain closed form expression of optimal control effort. In addition to this, initial stabilizing controller and two sets of neural networks (NN) (known as Actor-Critic) are required to implement IRL scheme. In this paper, a stabilizing term in the critic update law is leveraged to avoid the requirement of an initial stabilizing controller in IRL framework to solve optimal tracking problem with actuator constraints. With such a term, only one NN is needed to generate optimal control policies in IRL framework. This critic network is coupled with an experience replay (ER) enhanced identifier to obviate the necessity of control coupling dynamics in IRL algorithm. The weights of both identifier and critic NNs are simultaneously updated and it is shown that the ER-enhanced identifier is able to handle parametric variations better than without ER enhancement. The most salient feature of the novel update law is its variable learning rate, which scales the pace of learning based on instantaneous Hamilton-Jacobi-Bellman (HJB) error. Variable learning rate in critic NN coupled with ER technique in identifier NN help in achieving tighter residual set for state error and error in NN weights as shown in uniform ultimate boundedness (UUB) stability proof. The simulation results validate the presented "identifier-critic" NN on a nonlinear system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题