Many applications, e.g., in shared mobility, require coordinating a large number of agents. mean-field reinforcement learning addresses the resulting scalability challenge by optimizing the policy of a representative agent. In this paper, we address an important generalization where th