Model-based reinforcement learning uses models to plan, where the predictions and policies of an agent can be improved by using more computation without additional data from the environment, thereby improving sample efficiency. However, learning accurate estimates of the model is hard. Subsequently, the natural question is whether we can get similar benefits as planning with model-free methods. Experience replay is an essential component of many model-free algorithms enabling sample-efficient learning and stability by providing a mechanism to store past experiences for further reuse in the gradient computational process. Prior works have established connections between models and experience replay by planning with the latter. This involves increasing the number of times a mini-batch is sampled and used for updates at each step (amount of replay per step). We attempt to exploit this connection by doing a systematic study on the effect of varying amounts of replay per step in a well-known model-free algorithm: Deep Q-Network (DQN) in the Mountain Car environment. We empirically show that increasing replay improves DQN's sample efficiency, reduces the variation in its performance, and makes it more robust to change in hyperparameters. Altogether, this takes a step toward a better algorithm for deployment.

本研究从经验重放和模型的角度出发，对Deep Q-Network算法中回放量的变化对样本效率和算法健壮性的影响进行了系统性研究，在Mountain Car环境下获得了提高样本效率、降低性能波动、提高算法鲁棒性的结果，为算法应用方面提供了新的思路。

理解每步回放不同数量的影响