Deep reinforcement learning agents often face challenges to effectively coordinate perception and decision-making components, particularly in environments with high-dimensional sensory inputs where feature relevance varies. This work introduces SPRIG (Stackelberg Perception-Reinforcement learning with Internal Game dynamics), a framework that models the internal perception-policy interaction within a single agent as a cooperative Stackelberg game. In SPRIG, the perception module acts as a leader, strategically processing raw sensory states, while the policy module follows, making decisions based on extracted features. SPRIG provides theoretical guarantees through a modified Bellman operator while preserving the benefits of modern policy optimization. Experimental results on the Atari BeamRider environment demonstrate SPRIG's effectiveness, achieving around 30% higher returns than standard PPO through its game-theoretical balance of feature extraction and decision-making.

本研究解决了深度强化学习代理在高维感知输入环境中有效协调感知与决策组件的挑战，尤其是在特征相关性变化的情况下。提出的SPRIG框架将单个代理内部的感知与策略交互建模为合作的Stackelberg游戏，其中感知模块作为领导者处理原始感官状态，而策略模块则根据提取的特征做出决策。通过实验结果表明，SPRIG在Atari BeamRider环境中表现出色，回报率比标准PPO高约30%。

SPRIG：具有内部游戏动态的Stackelberg感知-强化学习