Green Security Games (GSGs) have been proposed and applied to optimize patrols conducted by law enforcement agencies in green security domains such as combating poaching, illegal logging and overfishing. However, real-time information such as footprints and agents' subsequent actions upon receiving the information, e.g., rangers following the footprints to chase the poacher, have been neglected in previous work. To fill the gap, we first propose a new game model GSG-I which augments GSGs with sequential movement and the vital element of real-time information. Second, we design a novel deep reinforcement learning-based algorithm, DeDOL, to compute a patrolling strategy that adapts to the real-time information against a best-responding attacker. DeDOL is built upon the double oracle framework and the policy-space response oracle, solving a restricted game and iteratively adding best response strategies to it through training deep Q-networks. Exploring the game structure, DeDOL uses domain-specific heuristic strategies as initial strategies and constructs several local modes for efficient and parallelized training. To our knowledge, this is the first attempt to use Deep Q-Learning for security games.

本研究提出了一种新的游戏模型GSG-I，结合了顺序移动和实时信息等关键元素，设计了基于双预言机框架和策略空间响应预言机的深度强化学习算法DeDOL来计算巡逻策略，以对抗最佳响应的攻击者，探索游戏结构使用领域特定启发式策略和构建多个局部模态以进行高效和并行化训练。这是首次尝试将深度Q-Learning应用于安全游戏。

基于实时信息的绿色安全游戏的深度强化学习