Inverse reinforcement learning (IRL) is the problem of recovering a system's latent reward function from observed system behavior. In this paper, we concentrate on IRL in homogeneous large-scale systems, which we refer to as swarms. We show that, by exploiting the inherent homogeneity of a swarm, the IRL objective can be reduced to an equivalent single-agent formulation of constant complexity, which allows us to decompose a global system objective into local subgoals at the agent-level. Based on this finding, we reformulate the corresponding optimal control problem as a fix-point problem pointing towards a symmetric Nash equilibrium, which we solve using a novel heterogeneous learning scheme particularly tailored to the swarm setting. Results on the Vicsek model and the Ising model demonstrate that the proposed framework is able to produce meaningful reward models from which we can learn near-optimal local controllers that replicate the observed system dynamics.

本文提出了一种基于SwarMDP框架的针对分布式多智能体相互作用的逆向强化学习算法，在该框架中，我们证明了与智能体相关的值函数相等，通过引入一种新异构学习策略，我们证明了该框架能够有效地产生有意义的本地奖励模型。

群体系统中的逆强化学习