Reinforcement learning has been widely successful in producing agents capable
of playing games at a human level. However, this requires complex reward
engineering, and the agent's resulting policy is often unpredictable. Going
beyond reinforcement learning is necessary to model a wide range of human
playstyles, which can be difficult to represent with a reward function. This
paper presents a novel imitation learning approach to generate multiple persona
policies for playtesting. Multimodal Generative Adversarial Imitation Learning
(MultiGAIL) uses an auxiliary input parameter to learn distinct personas using
a single-agent model. MultiGAIL is based on generative adversarial imitation
learning and uses multiple discriminators as reward models, inferring the
environment reward by comparing the agent and distinct expert policies. The
reward from each discriminator is weighted according to the auxiliary input.
Our experimental analysis demonstrates the effectiveness of our technique in
two environments with continuous and discrete action spaces.

本论文提出了一种用于玩家测试的多重个人策略生成的新型模仿学习方法 —— 多模态生成对抗模仿学习（MultiGAIL），其使用辅助输入参数来学习不同的个人策略，基于生成对抗模仿学习的多重判别器作为奖励模型，通过比较代理和不同专家策略来推断出环境奖励，并根据辅助输入对每个判别器的奖励进行加权。实验分析证明了我们的技术在连续和离散行动空间的两个环境中的有效性。