A key theme in the past decade has been that when large neural networks and
large datasets combine they can produce remarkable results. In deep
reinforcement learning (RL), this paradigm is commonly made possible through
experience replay, whereby a dataset of past experiences is used to train a
policy or value function. However, unlike in supervised or self