Deep Reinforcement Learning achieves very good results in domains where
reward functions can be manually engineered. At the same time, there is growing
interest within the community in using games based on Procedurally Content
Generation (PCG) as benchmark environments since this type of environment is
perfect for studying overfitting and generalization of agents under domain
shift. Inverse Reinforcement Learning (IRL) can instead extrapolate reward
functions from expert demonstrations, with good results even on
high-dimensional problems, however there are no examples of applying these
techniques to procedurally-generated environments. This is mostly due to the
number of demonstrations needed to find a good reward model. We propose a
technique based on Adversarial Inverse Reinforcement Learning which can
significantly decrease the need for expert demonstrations in PCG games. Through
the use of an environment with a limited set of initial seed levels, plus some
modifications to stabilize training, we show that our approach, DE-AIRL, is
demonstration-efficient and still able to extrapolate reward functions which
generalize to the fully procedural domain. We demonstrate the effectiveness of
our technique on two procedural environments, MiniGrid and DeepCrawl, for a
variety of tasks.

通过有限集的初始种子和一些训练稳定的修改，我们提出了一种基于对抗性反向强化学习的技术，名为 DE-AIRL，该技术能够显著减少对专家演示的需求，并仍能够将回报函数外推到完全程序化域，我们在 MiniGrid 和 DeepCrawl 的两个程序化环境中展示了我们的技术的有效性。