论文标题
递归遗憾匹配:一种解决时间不变的非线性零和差异游戏的一般方法
Recursive Regret Matching: A General Method for Solving Time-invariant Nonlinear Zero-sum Differential Games
论文作者
论文摘要
在本文中,提出了一种新方法来计算时间不变的非线性两人零和差异游戏的滚动NASH平衡。这个想法是将差异游戏转换为具有多个步骤的连续游戏的时间,并通过引入状态值函数,将顺序游戏转换为由几个正常形式游戏组成的递归,最后,每个正常形式游戏都通过动作抽象解决并遗憾地匹配。为了改善所提出方法的实时属性,可以将状态值函数保存在内存中。该方法可以处理鞍点存在或不存在的情况,并且可以避免对鞍点的存在的分析。如果不存在马鞍点,则可以获得混合的最佳控制对。在本文的最后,以一些示例说明了提出的方法的有效性。
In this paper, a new method is proposed to compute the rolling Nash equilibrium of the time-invariant nonlinear two-person zero-sum differential games. The idea is to discretize the time to transform a differential game into a sequential game with several steps, and by introducing state-value function, transform the sequential game into a recursion consisting of several normal-form games, finally, each normal-form game is solved with action abstraction and regret matching. To improve the real-time property of the proposed method, the state-value function can be kept in memory. This method can deal with the situations that the saddle point exists or does not exist, and the analysises of the existence of the saddle point can be avoided. If the saddle point does not exist, the mixed optimal control pair can be obtained. At the end of this paper, some examples are taken to illustrate the validity of the proposed method.