Mixed incentives among a population with multiagent teams has been shown to
have advantages over a fully cooperative system; however, discovering the best
mixture of incentives or team structure is a difficult and dynamic problem. We
propose a framework where individual learning agents self-regulate their
configuration of incentives through various parts of their reward function.
This work extends previous work by giving agents the ability to dynamically
update their group alignment during learning and by allowing teammates to have
different group alignment. Our model builds on ideas from hierarchical
reinforcement learning and meta-learning to learn the configuration of a reward
function that supports the development of a behavioral policy. We provide
preliminary results in a commonly studied multiagent environment and find that
agents can achieve better global outcomes by self-tuning their respective group
alignment parameters.

在多智能体团队中使用混合激励机制具有优势，作者们提出了一个框架，在此框架下，学习智能体可以通过其奖励函数的不同部分自我调节其激励配置。他们的模型基于分层强化学习和元学习的思想，可以学习支持行为策略发展的奖励函数的配置。初步结果表明，通过自我调整各自的团队配置参数，智能体可以实现更好的全局结果。