We study automated intrusion response for an IT infrastructure and formulate the interaction between an attacker and a defender as a partially observed stochastic game. To solve the game we follow an approach where attack and defense strategies co-evolve through reinforcement learning and self-play toward an equilibrium. Solutions proposed in previous work prove the feasibility of this approach for small infrastructures but do not scale to realistic scenarios due to the exponential growth in computational complexity with the infrastructure size. We address this problem by introducing a method that recursively decomposes the game into subgames which can be solved in parallel. Applying optimal stopping theory we show that the best response strategies in these subgames exhibit threshold structures, which allows us to compute them efficiently. To solve the decomposed game we introduce an algorithm called Decompositional Fictitious Self-Play (DFSP), which learns Nash equilibria through stochastic approximation. We evaluate the learned strategies in an emulation environment where real intrusions and response actions can be executed. The results show that the learned strategies approximate an equilibrium and that DFSP significantly outperforms a state-of-the-art algorithm for a realistic infrastructure configuration.

我们研究了IT基础设施的自动入侵响应，并将攻击者和防御者之间的交互建模为部分观测的随机博弈。通过强化学习和自我对抗使攻击和防御策略共同演化到均衡点，我们解决了这个博弈。我们通过将博弈递归地分解为可以并行求解的子博弈，解决了之前的方法在规模较大的实际场景中随着基础设施规模指数级增长的计算复杂度问题。我们引入了名为Decompositional Fictitious Self-Play（DFSP）的算法来解决分解后的博弈，该算法通过随机逼近来学习纳什均衡。我们在一个仿真环境中评估了学习到的策略，其中可以执行真实的入侵和响应行动。结果表明，学习到的策略接近一个均衡点，并且DFSP在实际基础设施配置下明显优于现有算法。

可扩展的入侵响应通过递归分解的学习