While reinforcement learning (RL) algorithms are achieving state-of-the-art performance in various challenging tasks, they can easily encounter catastrophic forgetting or interference when faced with lifelong streaming information. In the paper, we propose a scalable lifelong RL method that dynamically expands the network capacity to accommodate new knowledge while preventing past memories from being perturbed. We use a Dirichlet process mixture to model the non-stationary task distribution, which captures task relatedness by estimating the likelihood of task-to-cluster assignments and clusters the task models in a latent space. We formulate the prior distribution of the mixture as a Chinese restaurant process (CRP) that instantiates new mixture components as needed. The update and expansion of the mixture are governed by the Bayesian non-parametric framework with an expectation maximization (EM) procedure, which dynamically adapts the model complexity without explicit task boundaries or heuristics. Moreover, we use the domain randomization technique to train robust prior parameters for the initialization of each task model in the mixture, thus the resulting model can better generalize and adapt to unseen tasks. With extensive experiments conducted on robot navigation and locomotion domains, we show that our method successfully facilitates scalable lifelong RL and outperforms relevant existing methods.

本文介绍了一种可扩展的永久强化学习方法，采用Dirichlet过程混合模型对不稳定的任务分配进行建模，使用贝叶斯方法和EM算法对模型进行动态更新扩展，同时使用域随机化训练鲁棒性先验参数以使得模型可以更好地泛化和适应未知任务，并在导航和运动领域进行的实验展示了我们的方法成功实现了可扩展的终身RL，并优于相关现有方法。

可伸缩的终身强化学习的稳健任务模型的狄利克雷过程混合