Accelerating learning processes for complex tasks by leveraging previously
learned tasks has been one of the most challenging problems in reinforcement
learning, especially when the similarity between source and target tasks is
low. This work proposes REPresentation And INstance Transfer (REPAINT)
algorithm for knowledge transfer in deep reinforcement learning. REPAINT not
only transfers the representation of a pre-trained teacher policy in the
on-policy learning, but also uses an advantage-based experience selection
approach to transfer useful samples collected following the teacher policy in
the off-policy learning. Our experimental results on several benchmark tasks
show that REPAINT significantly reduces the total training time in generic
cases of task similarity. In particular, when the source tasks are dissimilar
to, or sub-tasks of, the target tasks, REPAINT outperforms other baselines in
both training-time reduction and asymptotic performance of return scores.

本研究提出了一种名为 REPAINT 的深度强化学习知识转移算法，它不仅在 On-policy 学习中传递了预训练模型的表征，还使用基于优势的经验选择方法在 Off-policy 学习中传递了遵循预训练模型收集的有用样本，实验结果表明 REPAINT 在任务相似性较低的一般情况下显著缩短了总训练时间，特别当源任务与目标任务不相似或为子任务时， REPAINT 在减少训练时间和返回分数的渐近性能方面都优于其他基线。