Data efficiency is a key challenge for deep reinforcement learning. We
address this problem by using unlabeled data to pretrain an encoder which is
then finetuned on a small amount of task-specific data. To encourage learning
representations which capture diverse aspects of the underlying MDP, we employ
a combination of latent dynamics modelling and unsupervised goal-conditioned
RL. When limited to 100k steps of interaction on Atari games (equivalent to two
hours of human experience), our approach significantly surpasses prior work
combining offline representation pretraining with task-specific finetuning, and
compares favourably with other pretraining methods that require orders of
magnitude more data. Our approach shows particular promise when combined with
larger models as well as more diverse, task-aligned observational data --
approaching human-level performance and data-efficiency on Atari in our best
setting. We provide code associated with this work at
this https URL

利用未标记数据预先训练编码器，然后在少量任务特定数据上微调，通过使用潜在动态建模和无监督的目标条件强化学习来促进学习代表捕捉底层 MDP 的多个方面，该方法显示出极高的数据效率并且提供与先前工作以及需要订单更多数据的其他预训练方法相比的最先进的性能