Crowd simulation is important for video-games design, since it enables to populate virtual worlds with autonomous avatars that navigate in a human-like manner. Reinforcement learning has shown great potential in simulating virtual crowds, but the design of the reward function is critical to achieving effective and efficient results. In this work, we explore the design of reward functions for reinforcement learning-based crowd simulation. We provide theoretical insights on the validity of certain reward functions according to their analytical properties, and evaluate them empirically using a range of scenarios, using the energy efficiency as the metric. Our experiments show that directly minimizing the energy usage is a viable strategy as long as it is paired with an appropriately scaled guiding potential, and enable us to study the impact of the different reward components on the behavior of the simulated crowd. Our findings can inform the development of new crowd simulation techniques, and contribute to the wider study of human-like navigation.

通过在虚拟世界中以人类般的方式导航的自主角色，群集模拟对于游戏设计非常重要。本文探索了基于强化学习的群集模拟的奖励函数设计，并根据其分析性质理论上阐明了特定奖励函数的有效性，并通过以能源效率为度量标准的一系列场景进行实证评估。实验结果表明，通过在适当缩放的引导潜能的基础上直接最小化能源消耗是一种可行的策略，这使得我们能够研究不同奖励组成对模拟群体行为的影响。我们的研究成果可为新的群集模拟技术开发提供参考，并对人类般导航的深入研究有所贡献。

通过强化学习来设计人群模拟的奖励函数