一种生成机器学习方法来优化追求逃避游戏

论文标题

一种生成机器学习方法来优化追求逃避游戏

A Generative Machine Learning Approach to Policy Optimization in Pursuit-Evasion Games

论文作者

Navabi, Shiva, Osoba, Osonde A.

论文摘要

我们认为，超过$ t $时间步长的两个代理商“蓝色”（追捕者）和“红色”（逃避者）之间玩过的追求逃避游戏[11]。红色旨在攻击蓝色领土。 Blue的目标是在时间$ t $的情况下拦截红色，从而限制了红色攻击的成功。蓝色必须通过选择确定其运动过程（我们设置中的速度和角度）的参数来计划其追求轨迹，以便按时间$ t $拦截红色。我们表明，蓝色在追求红色时的路径规划问题，可以作为不确定性下的顺序决策做出的问题。 Blue对Red的行动策略的不认识使分析动态编程方法难以置信地寻找蓝色的最佳行动策略。在这项工作中，我们有兴趣探索蓝色面临的策略优化问题的数据驱动方法。我们采用生成机器学习（ML）方法来学习蓝色的最佳行动政策。这突出了生成ML模型学习模拟追求逃避游戏动态的相关隐式表示的能力。我们通过广泛的统计评估证明了建模方法的有效性。这项工作可以看作是进一步采用生成建模方法的初步步骤，以解决在多机构学习和计划的背景下出现的政策优化问题[1]。

We consider a pursuit-evasion game [11] played between two agents, 'Blue' (the pursuer) and 'Red' (the evader), over $T$ time steps. Red aims to attack Blue's territory. Blue's objective is to intercept Red by time $T$ and thereby limit the success of Red's attack. Blue must plan its pursuit trajectory by choosing parameters that determine its course of movement (speed and angle in our setup) such that it intercepts Red by time $T$. We show that Blue's path-planning problem in pursuing Red, can be posed as a sequential decision making problem under uncertainty. Blue's unawareness of Red's action policy renders the analytic dynamic programming approach intractable for finding the optimal action policy for Blue. In this work, we are interested in exploring data-driven approaches to the policy optimization problem that Blue faces. We apply generative machine learning (ML) approaches to learn optimal action policies for Blue. This highlights the ability of generative ML model to learn the relevant implicit representations for the dynamics of simulated pursuit-evasion games. We demonstrate the effectiveness of our modeling approach via extensive statistical assessments. This work can be viewed as a preliminary step towards further adoption of generative modeling approaches for addressing policy optimization problems that arise in the context of multi-agent learning and planning [1].

下载PDF全文

下载文献需遵守相关版权规定

论文标题