Reinforcement learning (RL) is a powerful tool for solving complex
decision-making problems, but its lack of transparency and interpretability has
been a major challenge in domains where decisions have significant real-world
consequences. In this paper, we propose a novel Advantage Actor-Critic with
Reasoner (A2CR), which can be easily applied to Actor-Critic-based RL models
and make them interpretable. A2CR consists of three interconnected networks:
the Policy Network, the Value Network, and the Reasoner Network. By predefining
and classifying the underlying purpose of the actor's actions, A2CR
automatically generates a more comprehensive and interpretable paradigm for
understanding the agent's decision-making process. It offers a range of
functionalities such as purpose-based saliency, early failure detection, and
model supervision, thereby promoting responsible and trustworthy RL.
Evaluations conducted in action-rich Super Mario Bros environments yield
intriguing findings: Reasoner-predicted label proportions decrease for
``Breakout" and increase for ``Hovering" as the exploration level of the RL
algorithm intensifies. Additionally, purpose-based saliencies are more focused
and comprehensible.

本文介绍了一种新的带有解释性的 Actor-Critic 强化学习模型 A2CR，通过预定义和分类行为的目的，A2CR 自动生成了更全面、可解释的决策模式，从而提供了一系列功能，如基于目的的关键性、早期故障检测和模型监督，以促进负责任和可信任的强化学习。通过在动作丰富的 Super Mario Bros 环境中的评估，发现随着强化学习算法的探索程度加深，Reasoner 预测的标签比例在 “Breakout” 中降低，而在 “Hovering” 中增加。此外，基于目的的关键性更具针对性和可理解性。