In stackelberg security games when information about the attacker's payoffs
is uncertain, algorithms have been proposed to learn the optimal defender
commitment by interacting with the attacker and observing their best responses.
In this paper, we show that, however, these algorithms c
本文研究使用强化学习实现自动入侵防御。通过将攻击者和防守者之间的交互形式化为最优停止博弈并使用强化学习和自我对弈来让攻击和防御策略进化,我们找到有效应对动态攻击者的防御者策略,并通过引入 T-FP(一种虚构的自我对弈算法)来学习纳什均衡。我们发现我们的整体方法可以为实际 IT 基础架构产生有效的防御策略。