Over the last few years, deep reinforcement learning (RL) has shown impressive results in a variety of domains, learning directly from high-dimensional sensory streams. However, when networks are trained in a fixed environment, such as a single level in a video game, it will usually overfit and fail to generalize to new levels. When RL agents overfit, even slight modifications to the environment can result in poor agent performance. In this paper, we present an approach to prevent overfitting by generating more general agent controllers, through training the agent on a completely new and procedurally generated level each episode. The level generator generate levels whose difficulty slowly increases in response to the observed performance of the agent. Our results show that this approach can learn policies that generalize better to other procedurally generated levels, compared to policies trained on fixed levels.

本文探讨了通过在训练中使用过程化生成的关卡如何增加模型的泛化性能，并研究了其与人类设计的关卡的关系。结果表明，通过降低难度、调整关卡设计，可以获得更好的性能表现，并进行了降维和聚类分析来评估关卡生成器的分布。

通过程序化关卡生成，照亮深度强化学习的泛化问题