We introduce Autoverse, an evolvable, domain-specific language for single-player 2D grid-based games, and demonstrate its use as a scalable training ground for Open-Ended Learning (OEL) algorithms. Autoverse uses cellular-automaton-like rewrite rules to describe game mechanics, allowing it to express various game environments (e.g. mazes, dungeons, sokoban puzzles) that are popular testbeds for Reinforcement Learning (RL) agents. Each rewrite rule can be expressed as a series of simple convolutions, allowing for environments to be parallelized on the GPU, thereby drastically accelerating RL training. Using Autoverse, we propose jump-starting open-ended learning by imitation learning from search. In such an approach, we first evolve Autoverse environments (their rules and initial map topology) to maximize the number of iterations required by greedy tree search to discover a new best solution, producing a curriculum of increasingly complex environments and playtraces. We then distill these expert playtraces into a neural-network-based policy using imitation learning. Finally, we use the learned policy as a starting point for open-ended RL, where new training environments are continually evolved to maximize the RL player agent's value function error (a proxy for its regret, or the learnability of generated environments), finding that this approach improves the performance and generality of resultant player agents.

Autoverse是一种可扩展的、用于单人2D网格游戏的可进化的领域特定语言，可以作为开放式学习算法的可扩展训练场。通过使用元胞自动机类似的重写规则来描述游戏机制，Autoverse能够表达各种不同的游戏环境（如迷宫、地下城、推箱子谜题），这些环境对于强化学习代理来说是常见的测试基准。我们提出使用Autoverse从搜索中的模仿学习来启动开放式学习。通过进化Autoverse环境（其规则和初始地图拓扑）以最大化贪婪树搜索所需的迭代次数来生成日益复杂的环境和游玩轨迹的课程。然后，我们使用模仿学习将这些专家游玩轨迹提炼为基于神经网络的策略。最后，我们将学到的策略作为开放式强化学习的起点，不断进化新的训练环境，最大化强化学习代理的值函数误差，从而提高生成环境的可学习性和泛化性能。

Autoverse：一种可进化的游戏语言用于学习健壮的实体智能体