通过游戏和最佳停止学习安全策略

论文标题

通过游戏和最佳停止学习安全策略

Learning Security Strategies through Game Play and Optimal Stopping

论文作者

Hammar, Kim, Stadler, Rolf

论文摘要

我们使用加强学习研究自动入侵预防。遵循一种新颖的方法，我们将攻击者和防守者之间的相互作用作为最佳的停止游戏，并通过强化学习和自我扮演来发展攻击和防御策略。游戏理论的观点使我们能够找到有效反对动态攻击者的防御者策略。最佳的停止配方使我们深入了解最佳策略的结构，我们表明具有阈值属性。为了获得最佳的防御者策略，我们介绍了T-FP，这是一种虚构的自我玩法算法，通过随机近似学习NASH均衡。我们表明，T-FP优于我们用例的最先进算法。我们的学习和评估策略的总体方法包括两个系统：一个模拟系统，在该系统中，捍卫者策略是逐步学习的，而仿真系统则产生了驱动模拟运行以及评估学习策略的统计信息。我们得出的结论是，这种方法可以为实用的IT基础设施制定有效的辩护策略。

We study automated intrusion prevention using reinforcement learning. Following a novel approach, we formulate the interaction between an attacker and a defender as an optimal stopping game and let attack and defense strategies evolve through reinforcement learning and self-play. The game-theoretic perspective allows us to find defender strategies that are effective against dynamic attackers. The optimal stopping formulation gives us insight into the structure of optimal strategies, which we show to have threshold properties. To obtain the optimal defender strategies, we introduce T-FP, a fictitious self-play algorithm that learns Nash equilibria through stochastic approximation. We show that T-FP outperforms a state-of-the-art algorithm for our use case. Our overall method for learning and evaluating strategies includes two systems: a simulation system where defender strategies are incrementally learned and an emulation system where statistics are produced that drive simulation runs and where learned strategies are evaluated. We conclude that this approach can produce effective defender strategies for a practical IT infrastructure.

下载PDF全文

下载文献需遵守相关版权规定

论文标题