A key theme in the past decade has been that when large neural networks and
large datasets combine they can produce remarkable results. In deep
reinforcement learning (RL), this paradigm is commonly made possible through
experience replay, whereby a dataset of past experiences is used to train a
policy or value function. However, unlike in supervised or self-supervised
learning, an RL agent has to collect its own data, which is often limited.
Thus, it is challenging to reap the benefits of deep learning, and even small
neural networks can overfit at the start of training. In this work, we leverage
the tremendous recent progress in generative modeling and propose Synthetic
Experience Replay (SynthER), a diffusion-based approach to arbitrarily upsample
an agent's collected experience. We show that SynthER is an effective method
for training RL agents across offline and online settings. In offline settings,
we observe drastic improvements both when upsampling small offline datasets and
when training larger networks with additional synthetic data. Furthermore,
SynthER enables online agents to train with a much higher update-to-data ratio
than before, leading to a large increase in sample efficiency, without any
algorithmic changes. We believe that synthetic training data could open the
door to realizing the full potential of deep learning for replay-based RL
algorithms from limited data.

通过利用生成模型技术，我们提出了 Synthetic Experience Replay（SynthER），这是一种基于扩散的方法，能够有效地提高在数据有限的情况下训练强化学习代理的样本效率，并为重放学习算法的深度学习实现开启了使用合成数据的大门。