Accelerating the learning processes for complex tasks by leveraging previously learned tasks has been one of the most challenging problems in reinforcement learning, especially when the similarity between source and target tasks is low or unknown. In this work, we propose a REPresentation-And-INstance Transfer algorithm (REPAINT) for deep actor-critic reinforcement learning paradigm. In representation transfer, we adopt a kickstarted training method using a pre-trained teacher policy by introducing an auxiliary cross-entropy loss. In instance transfer, we develop a sampling approach, i.e., advantage-based experience replay, on transitions collected following the teacher policy, where only the samples with high advantage estimates are retained for policy update. We consider both learning an unseen target task by transferring from previously learned teacher tasks and learning a partially unseen task composed of multiple sub-tasks by transferring from a pre-learned teacher sub-task. In several benchmark experiments, REPAINT significantly reduces the total training time and improves the asymptotic performance compared to training with no prior knowledge and other baselines.

本研究提出了一种名为REPAINT的深度强化学习知识转移算法，它不仅在On-policy学习中传递了预训练模型的表征，还使用基于优势的经验选择方法在Off-policy学习中传递了遵循预训练模型收集的有用样本，实验结果表明REPAINT在任务相似性较低的一般情况下显著缩短了总训练时间，特别当源任务与目标任务不相似或为子任务时， REPAINT在减少训练时间和返回分数的渐近性能方面都优于其他基线。

深度强化学习中的知识转移(REPAINT)