Learning policies on data synthesized by models can in principle quench the
thirst of reinforcement learning algorithms for large amounts of real
experience, which is often costly to acquire. However, simulating plausible
experience de novo is a hard problem for many complex environmen