In this paper, we investigate the problem of overfitting in deep
reinforcement learning. Among the most common benchmarks in RL, it is customary
to use the same environments for both training and testing. This practice
offers relatively little insight into an agent's ability to generalize. We
address this issue by using procedurally generated environments to construct
distinct training and test sets. Most notably, we introduce a new environment
called CoinRun, designed as a benchmark for generalization in RL. Using
CoinRun, we find that agents overfit to surprisingly large training sets. We
then show that deeper convolutional architectures improve generalization, as do
methods traditionally found in supervised learning, including L2
regularization, dropout, data augmentation and batch normalization.

本文研究了深度强化学习中的过拟合问题，并使用程序生成的环境来构建不同的训练和测试集，其中引入了一个名为 CoinRun 的新环境，用作强化学习中泛化的基准。使用 CoinRun，作者发现代理程序会对相当大的训练集过拟合，还展示了更深层次的卷积体系结构以及传统监督学习中的方法，包括 L2 正则化，dropout，数据增强和批标准化等，能够提高泛化能力。