Transfer Learning (TL) has shown great potential to accelerate Reinforcement Learning (RL) by leveraging prior knowledge from past learned policies of relevant tasks. Existing transfer approaches either explicitly computes the similarity between tasks or select appropriate source policies to provide guided explorations for the target task. However, how to directly optimize the target policy by alternatively utilizing knowledge from appropriate source policies without explicitly measuring the similarity is currently missing. In this paper, we propose a novel Policy Transfer Framework (PTF) to accelerate RL by taking advantage of this idea. Our framework learns when and which source policy is the best to reuse for the target policy and when to terminate it by modeling multi-policy transfer as the option learning problem. PTF can be easily combined with existing deep RL approaches. Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art policy transfer methods in terms of learning efficiency and final performance in both discrete and continuous action spaces.

本研究提出了一种名为“Policy Transfer Framework”的框架，该框架采用多策略转移方式对强化学习中的目标策略进行直接优化，可以很方便地与现有的深度强化学习方法相结合，实验结果表明，该框架明显加速了学习过程，并在离散和连续动作空间中超越了现有的策略转移方法，具有较高的学习效率和最终性能。

自适应策略转移的高效深度强化学习