This paper addresses the problem of integrating local guide policies into a
Reinforcement Learning agent. For this, we show how to adapt existing
algorithms to this setting before introducing a novel algorithm based on a
noisy policy-switching procedure. This approach builds on a proper Approximate
Policy Evaluation (APE) scheme to provide a perturbation that carefully leads
the local guides towards better actions. We evaluated our method on a set of
classical Reinforcement Learning problems, including safety-critical systems
where the agent cannot enter some areas at the risk of triggering catastrophic
consequences. In all the proposed environments, our agent proved to be
efficient at leveraging those policies to improve the performance of any
APE-based Reinforcement Learning algorithm, especially in its first learning
stages.

本文研究如何将本地指南政策整合到强化学习代理系统中，提出了基于嘈杂策略切换的算法，并通过适当的近似策略评估方案，将本地指南引导向更好的行动，从而改善强化学习算法在安全关键系统等领域的性能。