Humans and animals can learn complex predictive models that allow them to
accurately and reliably reason about real-world phenomena, and they can adapt
such models extremely quickly in the face of unexpected changes. Deep neural
network models allow us to represent very complex functions, but lack this
capacity for rapid online adaptation. The goal in this paper is to develop a
method for continual online learning from an incoming stream of data, using
deep neural network models. We formulate an online learning procedure that uses
stochastic gradient descent to update model parameters, and an expectation
maximization algorithm with a Chinese restaurant process prior to develop and
maintain a mixture of models to handle non-stationary task distributions. This
allows for all models to be adapted as necessary, with new models instantiated
for task changes and old models recalled when previously seen tasks are
encountered again. Furthermore, we observe that meta-learning can be used to
meta-train a model such that this direct online adaptation with SGD is
effective, which is otherwise not the case for large function approximators. In
this work, we apply our meta-learning for online learning (MOLe) approach to
model-based reinforcement learning, where adapting the predictive model is
critical for control; we demonstrate that MOLe outperforms alternative prior
methods, and enables effective continuous adaptation in non-stationary task
distributions such as varying terrains, motor failures, and unexpected
disturbances.

本文旨在开发一种方法，从传入的数据流中使用深度神经网络模型进行连续的在线学习，使用随机梯度下降算法来更新模型参数，并使用先验的中餐馆过程的期望最大化算法来开发和维护一种混合模型来处理非平稳任务分布。我们将元学习应用于基于模型的强化学习，以适应预测模型关键控制任务中的连续快速自适应。