In this paper, we investigate the problem of overfitting in deep reinforcement learning. Among the most common benchmarks in RL, it is customary to use the same environments for both training and testing. This practice offers relatively little insight into an agent's ability to generalize. We address this issue by using procedurally generated environments to construct distinct training and test sets. Most notably, we introduce a new environment called CoinRun, designed as a benchmark for generalization in RL. Using CoinRun, we find that agents overfit to surprisingly large training sets. We then show that deeper convolutional architectures improve generalization, as do methods traditionally found in supervised learning, including L2 regularization, dropout, data augmentation and batch normalization.

本文研究了深度强化学习中的过拟合问题，并使用程序生成的环境来构建不同的训练和测试集，其中引入了一个名为CoinRun的新环境，用作强化学习中泛化的基准。使用CoinRun，作者发现代理程序会对相当大的训练集过拟合，还展示了更深层次的卷积体系结构以及传统监督学习中的方法，包括L2正则化，dropout，数据增强和批标准化等，能够提高泛化能力。

量化强化学习的泛化能力