Offline reinforcement learning leverages pre-collected datasets of
transitions to train policies. It can serve as effective initialization for
online algorithms, enhancing sample efficiency and speeding up convergence.
However, when such datasets are limited in size and quality, offline
pre-training can produce sub-optimal policies and lead to degraded online
reinforcement learning performance. In this paper we propose a model-based data
augmentation strategy to maximize the benefits of offline reinforcement
learning pre-training and reduce the scale of data needed to be effective. Our
approach leverages a world model of the environment trained on the offline
dataset to augment states during offline pre-training. We evaluate our approach
on a variety of MuJoCo robotic tasks and our results show it can jump-start
online fine-tuning and substantially reduce - in some cases by an order of
magnitude - the required number of environment interactions.

基于离线数据的强化学习预训练改进的模型数据增强策略，可以减少所需数据规模，并大幅提高在线微调效果和降低环境交互次数。

小数据集，巨大增益：通过基于模型的增强学习的离线预训练来提升性能

Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline  Pre-Training with Model Based Augmentation

In this paper, we describe a novel approach to imitation learning that infers
latent policies directly from state observations. We introduce a method that
characterizes the causal effects of latent actions on observations while
simultaneously predicting their likelihood. We then outline an action alignment
procedure that leverages a small amount of environment interactions to
determine a mapping between the latent and real-world actions. We show that
this corrected labeling can be used for imitating the observed behavior, even
though no expert actions are given. We evaluate our approach within classic
control environments and a platform game and demonstrate that it performs
better than standard approaches. Code for this work is available at
this https URL

本文提出了一种新的模仿学习方法，直接从状态观测推断潜在策略，并引入了一种方法来描述潜在动作对观测的因果影响，同时预测它们的可能性，从而确定潜在和实际行为之间的映射。本文在经典控制环境和平台游戏中评估了该方法，并表明它的性能优于标准方法。