Deep Reinforcement Learning (DRL) has achieved remarkable success in
sequential decision-making problems. However, existing DRL agents make
decisions in an opaque fashion, hindering the user from establishing trust and
scrutinizing weaknesses of the agents. While recent research has developed
Interpretable Policy Extraction (IPE) methods for explaining how an agent takes
actions, their explanations are often inconsistent with the agent's behavior
and thus, frequently fail to explain. To tackle this issue, we propose a novel
method, Fidelity-Induced Policy Extraction (FIPE). Specifically, we start by
analyzing the optimization mechanism of existing IPE methods, elaborating on
the issue of ignoring consistency while increasing cumulative rewards. We then
design a fidelity-induced mechanism by integrate a fidelity measurement into
the reinforcement learning feedback. We conduct experiments in the complex
control environment of StarCraft II, an arena typically avoided by current IPE
methods. The experiment results demonstrate that FIPE outperforms the baselines
in terms of interaction performance and consistency, meanwhile easy to
understand.

通过引入忠实度衡量机制并将其与强化学习反馈相结合，FIPE 方法在解释性和一致性方面优于现有方法，实验证明其在复杂控制环境中的性能和可理解性均较好。